freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
532 stars 148 forks source link

Segment combined opinions into lead and discretionaries #446

Open elliottash opened 8 years ago

elliottash commented 8 years ago

As a pre-cursor to assigning authors to opinions, we need a script to split up combined opinions. Ideally this could be run on any clusters that are already in the DB.

The segmenter can use tags like "(\w)+(s)+,dissenting" and "(\w)+(s)+,concurring"

elliottash commented 8 years ago

I will probably be able to assign an RA to this in the next week or two.

mlissner commented 8 years ago

Are you thinking that the segmentation is for the new stuff or the old? I theorize that we'd be better off either replacing the old stuff or just setting it aside and marking it as a duplicate (assuming the new stuff is easy to separate).

elliottash commented 8 years ago

The new stuff is already separated. This refers to the existing opinions.

mlissner commented 11 months ago

@flooie do you think we can close this?

flooie commented 11 months ago

Not at all.

mlissner commented 11 months ago

Ah, I just noticed it's in the project for adding the Columbia corpus: https://github.com/freelawproject/courtlistener/projects/2#card-179304

I'll put this on your backlog though so we can remember to close it when the time is right?

flooie commented 11 months ago

Will this ever be closed? All the documents we scraper are "combined opinions". Until we have a new way to identify them as lead and dissents we are going to keep this open forever no?

mlissner commented 11 months ago

Well, I think this was particularly relevant to the Columbia content, which means I think we did it or about to do it.

If it's about scraped content, there's no point keeping this issue around. There's no useful discussion or anything like that.