Police-Data-Accessibility-Project / scrapers

Code relating to scraping public police data.
https://pdap.io
GNU General Public License v3.0
157 stars 33 forks source link

File downloaders rework #229

Open EvilDrPurple opened 10 months ago

EvilDrPurple commented 10 months ago

Context

Our file downloaders could use a bit of a rework. They seem overly complex and only able to support a few different file types; with various modules calling to each other and requiring a specific order that is unclear. Not to mention all the defunct scripts littered about. I believe a much more straightforward approach is possible and will go a long way in helping people understand how and when to use our util modules. During work on #227, I found this way that will download any file type when provided with a download url:

r = requests.get(url, stream=True)
with open(file_path, 'wb') as fd:
    for chunk in r.iter_content():
        fd.write(chunk)

SEE: downloaders.py, get_files.py, muckrock_scraper.py

Requirements

Docs

Open questions