datamade / nmid-scrapers

Scrapers for NMID
MIT License
0 stars 0 forks source link

Scrape lobbyist employer expenditures #29

Open hancush opened 1 week ago

hancush commented 1 week ago

The current lobbyist scrape captures employer name (ClientName), but there is some additional metadata we could capture: https://docs.google.com/spreadsheets/d/1-c2Ony5hGjpchOfwPKhYyoJ9FPji0dfkgt0mTcBS9mo/edit?usp=sharing

Clarify with Marjorie which employer metadata lobbyist scrapes should include, then capture it.

hancush commented 1 week ago

@antidipyramid I'll email Marjorie to see what, if any, additional data about the employers she wants in the scrapes.

hancush commented 1 week ago

🚨 Glad I asked! We need to scrape the lobbyist employer expenditures from https://login.cfis.sos.state.nm.us/#/lobbyistexpendituresearch/31.

antidipyramid commented 1 day ago

@hancush There is a good amount of data processing going on in lobbyists.mk.

Do we want to do some kind of processing on the employer expenditures?

hancush commented 1 day ago

Great question, @antidipyramid. Some context on lobbyist (and lobbyist employer) scraping: The search interface does not include one very important piece of information: the beneficiary of the expenditure / contribution. So, the original lobbyist scrape downloads all of a lobbyist's filings, then parses information out of those PDFs.

It looks like lobbyist employers file the same information in the same format, e.g., https://login.cfis.sos.state.nm.us//ReportsOutput//LAR/4a27c051-7b49-456a-9936-98d595384a08.pdf

I wonder if we could simply plug them into the existing pipeline (perhaps with some modifications, since rather than a lobbyist associated with a client [employer], there will only be clients [employers])?

hancush commented 1 day ago

Looks like there's an https://login.cfis.sos.state.nm.us/api//ExploreClients/Disclosures endpoint that gets filings for lobbyist employers (while it's https://login.cfis.sos.state.nm.us/api//ExploreClients/Fillings for lobbyists) – see the network request when you click on "Filings" here: https://login.cfis.sos.state.nm.us/#/exploreClientDetailPublic/mDJ2oXreU_grMhUIIWBeHHwquY7yN_7SNrmbDh6rMxI1/10/2024

If you can modify the script that retrieves filings so it works for both lobbyists and lobbyist employers, I think you can use the rest of the pipeline (PDF parsing) as is, or close to it! What do you think?