CornellNLP / ConvoKit

ConvoKit is a toolkit for extracting conversational features and analyzing social phenomena in conversations. It includes several large conversational datasets along with scripts exemplifying the use of the toolkit on these datasets.
https://convokit.cornell.edu/documentation/
MIT License
556 stars 129 forks source link

Supreme Court Oral Arguments Corpus: Update Years #168

Open kakeith opened 2 years ago

kakeith commented 2 years ago

For a recent project I'm working on, we're using ConvoKit's implementation of the Supreme Court Oral Argument Corpus. However, we'd really like to include data from after 2019.

How difficult would it be to run scripts to update the dataset for cases after 2019?

Thanks, Katie

cristiandnm commented 2 years ago

Hi Katie,

Happy to hear you are finding this data useful in your project! @tisjune developed this corpus, so she might be able to chime in and help with updating it. Although I don't really know how hard it is (e.g., if it involves any manual fixes) or if she has the time at the moment.

Cristian

tisjune commented 2 years ago

Hi Katie -- Unfortunately I don't have a script (or, I forgot the password to the machine that stores the collection of files that more or less document what I did) that pulls/can update the dataset, and there is some manual tinkering involved. In short, if you want to get started:

kakeith commented 2 years ago

@tisjune @cristiandnm thanks for replying so quickly!

I'll pass on this info to my collaborators and see if there's interest in trying to update the corpus. If so, would you be interested in us contributing scripts to ConvoKit to make sure this corpus can continued to be updated in the future?

Thanks and best, Katie

cristiandnm commented 2 years ago

Thanks Katie,

Yes, we would be definitely interested in updating the dataset and having scripts ready for future updates. Let us know if we can help along the way.

biaoyanf commented 2 years ago

Hi, @kakeith, I'm also interested in using this data with more updated years. How far have you got? Would that be publicly available if you have the updated data? Thanks!