Open hancush opened 1 week ago
@antidipyramid I'll email Marjorie to see what, if any, additional data about the employers she wants in the scrapes.
🚨 Glad I asked! We need to scrape the lobbyist employer expenditures from https://login.cfis.sos.state.nm.us/#/lobbyistexpendituresearch/31.
@hancush There is a good amount of data processing going on in lobbyists.mk
.
Do we want to do some kind of processing on the employer expenditures?
Great question, @antidipyramid. Some context on lobbyist (and lobbyist employer) scraping: The search interface does not include one very important piece of information: the beneficiary of the expenditure / contribution. So, the original lobbyist scrape downloads all of a lobbyist's filings, then parses information out of those PDFs.
It looks like lobbyist employers file the same information in the same format, e.g., https://login.cfis.sos.state.nm.us//ReportsOutput//LAR/4a27c051-7b49-456a-9936-98d595384a08.pdf
I wonder if we could simply plug them into the existing pipeline (perhaps with some modifications, since rather than a lobbyist associated with a client [employer], there will only be clients [employers])?
Looks like there's an https://login.cfis.sos.state.nm.us/api//ExploreClients/Disclosures
endpoint that gets filings for lobbyist employers (while it's https://login.cfis.sos.state.nm.us/api//ExploreClients/Fillings
for lobbyists) – see the network request when you click on "Filings" here: https://login.cfis.sos.state.nm.us/#/exploreClientDetailPublic/mDJ2oXreU_grMhUIIWBeHHwquY7yN_7SNrmbDh6rMxI1/10/2024
If you can modify the script that retrieves filings so it works for both lobbyists and lobbyist employers, I think you can use the rest of the pipeline (PDF parsing) as is, or close to it! What do you think?
The current lobbyist scrape captures employer name (ClientName), but there is some additional metadata we could capture: https://docs.google.com/spreadsheets/d/1-c2Ony5hGjpchOfwPKhYyoJ9FPji0dfkgt0mTcBS9mo/edit?usp=sharing
Clarify with Marjorie which employer metadata lobbyist scrapes should include, then capture it.