Scrape Filings from candidate and PAC committee details pages

fgregg commented 7 months ago

In order to get the opening and closing balance for candidates and committees we need to get their filings from the candidate/committee detail pages.

Screenshot 2024-04-08 at 11-24-03 New Mexico Campaign Finance System https://login.cfis.sos.state.nm.us/#/exploreDetails/RiiKoPNxtHg4P69Mc3r0NH1lK5MpzTLbNw12UnzEQ-I1/14/22/120/2024

We need to download the filings, i.e. https://login.cfis.sos.state.nm.us//ReportsOutput//103/b6375ec9-9605-474c-843a-f7cb732c0f35.pdf

and extract this table: Screenshot 2024-04-08 at 11-25-01 rpt_File_ExpAndConReport - b6375ec9-9605-474c-843a-f7cb732c0f35 pdf

We already have a scraper that can visit every detail page: https://github.com/datamade/nmid-scrapers/blob/main/scrapers/office/scrape_search.py

Let's hook into the scrape method to make the ajax call to get the details about the filings. We will then need to fetch the pdf and scrape out the info from the pdf.

Right now the scraper yields rows for candidates, on for each campaign year.

i would like the scraper to yield an object like {'years': [...current info that we are scraping], and 'filings': [all the metadata about the filing from the ajax call plus the information scraped out of the pdf]}

fgregg commented 7 months ago

Let's get to this point, and then we can talk about CSV outputs.

antidipyramid commented 7 months ago

@fgregg I've noticed a few things:

For some candidates and committees, the year parameter in the filings endpoint isn't what you'd expect. For example, the URL for this candidate's 2024 filings require a year param of 3528 (it doesn't work with the value 2024). Instead of querying a specific year, you can use electionYear=All.
The filing URL for some candidates (and all committees?) requires a committeeID value that differs from the IDNumber returned by the search endpoint (e.g. this candidate)

fgregg commented 7 months ago

closed by #25

datamade / nmid-scrapers

Scrape Filings from candidate and PAC committee details pages #24