biglocalnews / clean-scraper

Scraper library and CLI tool to harvest police bodycam footage and other files as part of the Community Law Enforcement Accountability Project (CLEAN)
https://biglocalnews.org/content/collaborations/clean.html
Apache License 2.0
8 stars 10 forks source link

Possible refactor? downloads + post requests #118

Open tarakc02 opened 1 month ago

tarakc02 commented 1 month ago

utils.get_url and utils.post_url contain a lot of duplicated code, we thought we could simplify by turning them into one method.

naumansharifwork commented 1 month ago

@tarakc02 @stucka @newsroomdev Our current download method uses the get_request , where ever we need to download the files with the post_request currently we are writing them into respective scrapers which is causing a bit of extra code.

There are 2 ways that i proposed that will help us to refactor this

Let me know what others think about this, or any other ideas.

stucka commented 1 month ago

A lot of the time it's going to be a one-line fix, to call cache.write_binary to save the contents. But that skips the download functionality of force, which is handy.

@naumansharifwork , I like your first proposal -- it's elegant and clean and makes a ton of sense. as long as a "data"/payload can be passed through kwargs, I think it's great.

I'd also again urge we consider bumping up the download buffer size to something like 1024*1024 from the existing 8192. =)

stucka commented 1 month ago

Oh, and if we're stripping any functionality of existing libraries we'll need to know what's using that functionality and how/when/who refactors the code. If the code is stable and fully functional and already used, there might not be a real reason to remove it ... ?