Open elliottash opened 8 years ago
I will probably be able to assign an RA to this in the next week or two.
Are you thinking that the segmentation is for the new stuff or the old? I theorize that we'd be better off either replacing the old stuff or just setting it aside and marking it as a duplicate (assuming the new stuff is easy to separate).
The new stuff is already separated. This refers to the existing opinions.
@flooie do you think we can close this?
Not at all.
Ah, I just noticed it's in the project for adding the Columbia corpus: https://github.com/freelawproject/courtlistener/projects/2#card-179304
I'll put this on your backlog though so we can remember to close it when the time is right?
Will this ever be closed? All the documents we scraper are "combined opinions". Until we have a new way to identify them as lead and dissents we are going to keep this open forever no?
Well, I think this was particularly relevant to the Columbia content, which means I think we did it or about to do it.
If it's about scraped content, there's no point keeping this issue around. There's no useful discussion or anything like that.
As a pre-cursor to assigning authors to opinions, we need a script to split up combined opinions. Ideally this could be run on any clusters that are already in the DB.
The segmenter can use tags like "(\w)+(s)+,dissenting" and "(\w)+(s)+,concurring"