LexPredict / lexpredict-lexnlp

LexNLP by LexPredict
GNU Affero General Public License v3.0
690 stars 175 forks source link

Clarity on dataset used for pre-training #41

Open ShrikanthSingh opened 4 years ago

ShrikanthSingh commented 4 years ago

According to the published paper LexNLP: Natural language processing and information extraction for legal and regulatory texts, the abstract states that LexNLP includes pre-trained models based on real documents from SEC EDGAR database. I want to clarify, does it mean that LexNLP captures entites such as Acts, Regulations and Citations based on the knowledge from pre-training ? Becasue I want to use LexNLP to extract these entities from documents belonging to categories like Abortion, Bankruptcy, Sentencing, Environmental Law etc. but after knowing the SEC EDGAR database is something that emphasizes on data related to investment, finance and capitalization, I am skeptical if LexNLP can extract legal entites from off domain categories mentioned above.