Closed arderyp closed 1 year ago
Cool. I'll give a call. Going to be a tough one, but may as well try.
I believe I did email them at one point and they basically said "no"
I'm a novice but I think this may be what you're looking for: https://www2.texasattorneygeneral.gov/opinion/index-to-opinions
thanks for the link @ix4
Oof, that's a pretty gross interface. Not sure how we'd even approach scraping that, or if all the metadata is present. Want to have a look @mlissner?
So ugly. It's almost as if it's on purpose. I was thinking, after an initial scrape, perhaps using their email notification subscription (maybe with kill-the-newsletter.com type feed) could work nicely being that it'll come in structured (subject, date, attachment...)
kill-the-newsletter.com looks neat, but I'd rather not rely on it. I think this isn't so hard to scrape really.
Some thoughts:
Not sure how we'd even approach scraping that,
I feel a little confused at what the question is. Although this interface seems completely unacceptable for use by humans, it doesn't seem hard to scrape? You just pull links out of these horrid option dropdowns:
@johnhawkinson, I’m on the road and reviewed the site in my phone. After about 20 seconds of looking at those forms it seemed pretty funky, but I admittedly didn’t examine it very thoroughly. I see the listings now in the dropsdowns, is the pertinent metadata accompanying each dropdown option, or would we have to scrape the actual opinion PDFs? On mobile I don’t immediately see titles and dates, for example. I guess we can use the “estimated” date logic, but titles?
Sound like a good idea @mlissner to contact the court before starting the hacking. If we have to go the later route, sounds like @johnhawkinson can whip something together quickly to get what we need.
Err…I just don't understand what you meant by "not sure how we'd even approach scraping."
Did you mean about metadata? I was just referring to getting the PDFs. Yeah, there's no metadata on the dropdown page. Though the subject index Mike points out does have something, so perhaps that's the better way to go, although it's quite a few pages to scrape.
Yeah, @johnhawkinson, one thing we always need for opinions are the date, title (this is the case name, usually), and a few other bits of metadata. A link to just a PDF doesn't really work for us very well, though we've hacked our way through it in the past by using something like, "Unknown title" for the case name.
I think we're all agreed dropdowns are horrid but workable.
@arderyp, do you think you could write them an email? Should we get you a free.law email address so you're more official?
I’m slightly confused @mlissner. You think the dropdowns are worth scraping without titles and with estimated (year only) dates?
I’m happy to send an email with or without a FLP address. I’ve had decent success without, so far. Am I just asking if they have an alternate, more easily scrapeable data source (that includes title and other metadata)?
I am out of town for the next two weeks, but happy to look into this when I get back.
The subject pages have titles (are those titles?) but i don’t see dates.
Seems like we could scrape year+docket+url from the dropdowns, or title+docket+url from subject pages, but nowhere can we get all 4 (date+title+docket+url)
Scraping both just doesn't seem horrible. Suboptimal, yes, but easily doable.
@arderyp, let's start with an email and see if we can get them to make a better page. I'd frame it more as a "Can we work with you on this process" type of email where we want to get a dialog going.
If it comes to it, maybe it's a good idea to gather titles from that other page, yes. We could do it using a deferring list, I think. Maybe. I wouldn't be opposed to going off the usual Juriscraper template if that's what it took to scrape/merge these two pages.
@mlissner, okay.
@mlissner I contacted the court through this web form: https://www.texasattorneygeneral.gov/contact-us-online-form
If you know of some other better way, please let me know.
The chief....clerk of the court in Texas Supreme is super active on Twitter. If I had a hope of getting something out of Texas, he's where I'd start. OTOH, I suspect the AG is its own thing and they've got a phone number? Maybe?
On Wed, Jun 26, 2019 at 7:20 AM Philip Ardery notifications@github.com wrote:
@mlissner https://github.com/mlissner I contacted the court through this web form: https://www.texasattorneygeneral.gov/contact-us-online-form
If you know of some other better way, please let me know.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/freelawproject/juriscraper/issues/262?email_source=notifications&email_token=AABZ3KVY2HLCTPZWA3DFTETP4N3KFA5CNFSM4HMVZJRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODYTV67Q#issuecomment-505896830, or mute the thread https://github.com/notifications/unsubscribe-auth/AABZ3KUQ5STNA6ENORTLPB3P4N3KFANCNFSM4HMVZJRA .
-- Mike Lissner Executive Director Free Law Project https://free.law
@mlissner no response yet, and I dont have twitter
Left a (regrettable) message with the "Open Gov't Hotline" at the TX AG. It's mostly for FOIA, but it might have somebody with the right kind of mentality to help us, so I'm giving it a shot. They promise to call back in a few days.
OK, no surprise except that they called back after hours: I've been sent onwards to the AG constituent services. Let's find out if they consider us a constituent.
best of luck
Well, I spoke to Albert in the constituent affairs division. He forwarded it to the website division, who may get back to me. I'll try to stay on this. The number is: 512-475-4413.
TIL that they take lunch extremely seriously. If you call during lunch hour 10-11PST, somebody answers, but you have no choice but to call back later. Sigh.
Just tried again. Left a message. So during lunch they have somebody to answer phones. After lunch they do not. Sigh.
Alright, so I talked to their public affairs office again and they just say to forward a message via the contact form on the site, so that's probably a waste of time. They actually have an "Opinions Committee" as well, so I called them, but they pointed me to "constituent affairs" as well. Looks like the only options here are to do a letter writing campaign (ugh), or to deal with the site as it is. There's also a way to subscribe to get emails about these, so I'll sign us up for that via our usual email address for that purpose.
So I think if we want to get these opinions we need to either throw some labor at it every day (not possible at the moment), figure out what West/Lexis uses for the title of these (anybody able?), or just put some sort of placeholder for the label that we dream up.
oof, sounds like all bad options. I would definitely be interested to hear from people (West/Lexis) who are successfully scraping these opinions, but I have no contacts there myself.
I was thinking they probably use humans to solve this, but that it might be good to see what their solution looks like if anybody can pull up an example in West/Lexis.
I also don't have access to those systems to compare myself, but surely someone else does.
I updated the Texag scraper to use @ix4 suggestion. They dont produce that many opinions and the sidebar actually posts the most recent opinions. Currently only three in the last two months. I suggest in my PR to just scrape that for now and we can write a back scraper for missed opinions at some point
A message was posted some time ago to the opinions page indicating that AG opinions must be requested and will no longer be published publicly. Consequently, we should either call the court to see if we can get some sort of data stream, or kill this scraper.
https://www.texasattorneygeneral.gov/attorney-general-opinions