freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
541 stars 149 forks source link

Merge or stub all known cases into CL #1162

Open flooie opened 4 years ago

flooie commented 4 years ago

Need to add method to merge or stub all known cases into CL

nemobis commented 2 years ago

This would be great, especially if there is at least basic information such as the date and nature of suit. (I'm interested in new and old copyright cases which have not been archived yet.)

mlissner commented 2 years ago

We're working on it.

quevon24 commented 1 year ago

After inspecting the models, this is the minimum data for each model required to create a stub case:

Docket: court and source Cluster: docket and date_filed Opinion: cluster and type Citation: cluster, volume, reporter, page and type

Also i ran some tests on the lexis and westlaw datasets and found that we don't always have a case name, a court, date filed or even some cases don't have citations. I was thinking that we could add a new opinion type to easily identify stub cases

I found a problem with Citation model, why volume field only accept integers but if you parse citation with eyecite it allows numbers and characters in volume?

I found some citations in lexis data like: 71-A A.F.T.R.2d (RIA) 3088 or 88-1 U.S. Tax Cas. (CCH) P9373, i think this field in citation model should be a text field instead of a positive integer field, i just need to analyze if this change won't affect any other functionality in the system

flooie commented 1 year ago

Yes, we clearly need to update the citation object, but thats a big update. I think you mentioned we need to discuss stubbing your PR so here goes.

I think I am learning towards creating a new table to handle stubbed cases. I think that makes a lot of sense and would let us keep the model relatively safe. That will also for the most part protect our users for displaying a stubbed case that @mlissner suggested.

I think we need to dive deeper in citations to make sure that we can safely change the volume field because but I think we may also want to think bigger about citations in general. Perhaps the easiest thing would be to allow a citation to have a text field for atypical citation formats.

quevon24 commented 1 year ago

Yes, we clearly need to update the citation object, but thats a big update. I think you mentioned we need to discuss stubbing your PR so here goes.

I think I am learning towards creating a new table to handle stubbed cases. I think that makes a lot of sense and would let us keep the model relatively safe. That will also for the most part protect our users for displaying a stubbed case that @mlissner suggested.

I think we need to dive deeper in citations to make sure that we can safely change the volume field because but I think we may also want to think bigger about citations in general. Perhaps the easiest thing would be to allow a citation to have a text field for atypical citation formats.

Yes, definitely citations is a complex thing, if we allow the volume to be a text field that might break other things, like happened with page number in this issue: #2474

I think the new table is the best way to handle the stubbed cases and to avoid showing them to the end user, probably one table for the case data and another table to store the stub case citations, that way we could implement a management command to try to check if we can add that information to an existent case