Open fgregg opened 7 months ago
Let's get to this point, and then we can talk about CSV outputs.
@fgregg I've noticed a few things:
For some candidates and committees, the year parameter in the filings endpoint isn't what you'd expect. For example, the URL for this candidate's 2024 filings require a year param of 3528
(it doesn't work with the value 2024
). Instead of querying a specific year, you can use electionYear=All
.
The filing URL for some candidates (and all committees?) requires a committeeID
value that differs from the IDNumber
returned by the search endpoint (e.g. this candidate)
closed by #25
In order to get the opening and closing balance for candidates and committees we need to get their filings from the candidate/committee detail pages.
https://login.cfis.sos.state.nm.us/#/exploreDetails/RiiKoPNxtHg4P69Mc3r0NH1lK5MpzTLbNw12UnzEQ-I1/14/22/120/2024
We need to download the filings, i.e. https://login.cfis.sos.state.nm.us//ReportsOutput//103/b6375ec9-9605-474c-843a-f7cb732c0f35.pdf
and extract this table:
We already have a scraper that can visit every detail page: https://github.com/datamade/nmid-scrapers/blob/main/scrapers/office/scrape_search.py
Let's hook into the
scrape
method to make the ajax call to get the details about the filings. We will then need to fetch the pdf and scrape out the info from the pdf.Right now the scraper yields rows for candidates, on for each campaign year.
i would like the scraper to yield an object like {'years': [...current info that we are scraping], and 'filings': [all the metadata about the filing from the ajax call plus the information scraped out of the pdf]}