LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
https://open-assistant.io
Apache License 2.0
36.94k stars 3.23k forks source link

Adding Legal Documents as Dataset for improving model's legal advice ability #1432

Open ElJaviLuki opened 1 year ago

ElJaviLuki commented 1 year ago

Statement

The current Open Assistant model lacks the ability to provide accurate legal advice due to the lack of legal documents in its dataset. This could lead to incorrect information being given and causing harm to users seeking legal advice.

Proposed Solution

In order to improve the legal advice skills of the model, it is proposed that legal documents from several countries be added to the dataset. This includes Constitution, Federal and State Laws (if that applies for the country), Jurisprudence, Traditions and Customs, Legal Doctrine and any other source of law. These legal documents would provide the model with the necessary information to provide accurate legal advice to users.

Steps to Implement

  1. Obtain lots of legal documents from source of law from different countries (USA, European Union (European Law & laws from each country), Russia, China, India, Australia, and a long etc.).
  2. Clean and pre-process the data to ensure it is in a format that can be used by the model.
  3. Integrate the legal documents into the model's existing dataset.
  4. Train the model using the new legal documents dataset.
  5. Test the model's performance to ensure that it is providing accurate legal advice.

Benefits

By adding legal documents to the model's dataset, it would improve the accuracy of legal advice provided by the model and increase the trust users have in the information provided by Open Assistant. This would also increase the value of Open Assistant as a tool for providing legal advice.

Open Questions

huu4ontocord commented 1 year ago

Great idea! I've assigned to you.

nmeln commented 1 year ago

Could be useful: https://lang.org.ua/en/corpora/#anchor7 (under the Corpus of laws and legal acts header)

large (more than 9 Gb) corpus of laws and legal acts of Ukraine

The cutoff date is approx. 2016-2017 though. So while it may be useful for the model to learn the structure of legal documents, some laws or legal acts could have become outdated since.

huu4ontocord commented 1 year ago

@ElJaviLuki - Hi can you give us a status on this issue?