Open grossir opened 1 week ago
Looking good, @grossir! Now I have the next hard question: How many hours or files, approximately, on each — or put another way, where do we start? The other question is what do we do about video? We could probably start storing it, but we'd want to optimize/normalize the file types, and price out the storage costs, since they might start to matter....
Looks like this will be a big project.
I will try to calculate seconds available where possible, but I think the number of files is a decent proxy. Most sites do not list any oral arguments statistics, and I would have to implement basically a scraper to get the numbers.
I think the best way to start is to implement the sources that match our current model (audio files with case metadata), needing the least effort: we will just implement the scraper / backscraper.
Texas' courts tex
and texapp
hold a lot of data. Then va
, tenn
, ind
and indtc
and, a little trickier, nj
, for the courts I have mapped so far
To include video would take us more time, having to implement model and doctor
changes; changes to the frontend to watch the videos; and having to calculate storage costs. Do we want that, anyway? Why not extract the audio from the video? Related: #44
After scraping the courts that self host their audio, I think we should work on the ones that upload to Youtube, since they all have a related scraping / processing step. Luckily in this step we would scrape some of the big courts, like ny
and fla
Finally, the ones that "self" host their videos, or use a provider like granicus (cal
is one of those)
That all sounds good. Start with the easy stuff and then move to the trickier stuff.
I'm not sure what we should do about video. Long term, probably the right thing is to extract audio from it, and to also host the video, so API users can choose if they want audio or video.
Hosting video is going to be expensive and complex, so maybe step one is just to scrape and store video with a cheap storage class, and step two will be to actually figure out how to serve it.
But for now, yes, let's finish the survey, and when we're ready, we can start with scraping audio, then do video in a second phase.
Ordered by population:
cal
/ California Supreme Courttex
/ Texas Supreme Courttexapp
/ Texas Courts of Appealstex
, for 2008 to 2024, 2018, 2024, 2019 to 2024 reespectivelyfla
/ Florida Supreme Courtfladistctapp
ny
/ New York Court of Appealspa
/ Pennsylvania Courts Couldn't find an oral arguments section in their website. Did find this, so it seems they don't publish the oral arguments?Following the mention of "Pennsylvania Cable Network", I did found a courts section on that website with videos of oral arguments; but I can't find case data to link the audio properly
ohio
/ Ohio Supreme Courtga
/ Georgia Supreme Courtnc
/ North Carolina Supreme Courtmich
/ Michigan Supreme Courtnj
/ New Jersey Supreme Courtva
/ Virginia supreme courtwa
/ Washington Supreme courtariz
/ Arizona Supreme Courtcal
, does not work for metenn
/ Tennessee Supreme Courtmass
/ Massachusetts Supreme Courtind
/ Indiana Supreme Courtindtc
/ Indiana Tax Courtmo
/ Missouri Supreme Courtmd
/ Maryland Supreme Courtwis
/ Wisconsin Supreme Courtcolo
/ Colorado Courtsminn
/ Minnessota Supreme Courtsc
/ South Carolina Supreme Court