Open mlissner opened 7 years ago
Maybe this should be a living ticket where we update the list of new scrapers to add. If so, maybe we could rename this issue, or create a new issue to handle the living list. Do you think we should create separate child issues for each new scraper we want to add, so we can close those when done? It would be helpful when adding to our living list to indicate:
Maybe it could be a 3 column table that we add new desired scrapers to with the info above. If this sounds reasonable, could you update this issue (or create new issue for living list) to include this new format/info for your Maryland example above? And if we want this to be a living issue with links to sub issues, could you also add the new scrapers from #167 to the list here with the pertinent 1-3 info mentioned above?
I like the idea of having one ticket per court, but obviously I occasionally make bigger tickets like this one. We've also used the github wiki in the past for this purpose (I think there's still a page with a gazillion court links).
Dunno. I'd say let's keep this one focused on State AG. Closing tickets is nice, and keeping them short is nice too.
So, in that vein, here's the info we need:
I'll also just point out that the module name will almost always correspond with the ID in CourtListener.
If you're new here and can help, please say which scraper you're able to work on, and check out the readme to get started.
dear lord! Is there any strategy here, or just start working from the top?
I'll take care of the others in #167 first so we can close that ticket.
opinions/united_states/federal_special/ag.py
opinions/united_states/state_special/mdag.py
The right way to do it is to start with the most populous states. The fun way is to find easy ones with big archives that are easy to traverse.
On Fri, Jan 20, 2017, 21:21 Philip Ardery notifications@github.com wrote:
dear lord! Is there any strategy here, or just start working from the top?
I'll take care of the others in #167 https://github.com/freelawproject/juriscraper/issues/167 first so we can close that ticket.
opinions/united_states/federal_special/ag.py opinions/united_states/state_special/mdag.py
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/freelawproject/juriscraper/issues/168#issuecomment-274237449, or mute the thread https://github.com/notifications/unsubscribe-auth/AAOdqkV4UHrRG7GbmzF7goHrjiJHyVrgks5rUZW-gaJpZM4LTKNz .
-- Mike Lissner Executive Director Free Law Project https://free.law
I don't know if this is useful or not, but the Alabama AG has opinions going back to 1979. They're numbered differently depending on period:
When searching by opinion number use the following formats: Opinions 79-00001 to 95-00338, use format yynnnnn (ex: 8500001) Opinions 96-00001 to 99-00290, use format yy-nnnnn (ex: 96-00331) Opinions 2000-001 to Present, use format yyyy-nnn (ex: 2000-025)
And these are the correct ranges from 1994-2000:
9400001 - 9400267 9500001 - 9500338 96-00001 - 96-00331 97-00001 - 97-00298 98-00001 - 98-00225 99-00001 - 99-00290 2000-001 - 2000-252
The PDFs are named according to the above scheme. So, examples from the three different date formats:
https://www.alabamaag.gov/Documents/opin/9400001.pdf https://www.alabamaag.gov/Documents/opin/97-00001.pdf https://www.alabamaag.gov/Documents/opin/2000-003.pdf
That's great. One day, perhaps, we'll get on this, but absent a volunteer picking it up, it's outside of our budget to do this work for the moment.
I'm just learning Python, so maybe if I ever get to a place where I understand *args and **kwargs, I can help. But at least the information's there now. :)
Notes on AGs
Massachusetts, - no longer issued
Mississippi - Westlaw
Iowa - Westlaw
Rhode Island - None Found
Utah - None Found
Wyoming - Google drive
New Mexico - None Found
Virgin Islands - None found
Puerto Rico - None found
Guam - Google drive
I suppose the google drives could maybe be tackled with selenium but I wasn't up for figuring that out.
Massachusetts does issue opinions on meetings - but its not the same thing.
Rhode Island is a mystery because I do think they exists but I dont know where.
This was the bare minimum - and didnt set up for back scraping opinions.
Also It looks to me that lots of offices have scaled back - or they post these very intermittently.
Some may not post in years and some haven't posted in years - but I set up the scrapers anyway.
I also moved all the AG scrapers into a new folder juriscraper/opinions/united_states/attorney_general
And moved all the previously added ones into that directory
Two top level tasks here:
[x] Trawl the internet and find all the available sources.
[x] Make the scrapers.
I'll develop a list below of all scrapers we want to build.