Police-Data-Accessibility-Project / scrapers

Code relating to scraping public police data.
https://pdap.io
GNU General Public License v3.0
157 stars 35 forks source link

Creation of example scrapers, bug fixes #223

Closed EvilDrPurple closed 1 year ago

EvilDrPurple commented 1 year ago

Part of #212

Two example scrapers were added to the /examples_templates/ directory from the Pittsburgh Police scraper and the Cal Poly Humboldt scraper

Unfortunately, Internet Archive does not play nice with CrimeGraphics (or at least, I wasn't able to find a way to make it) so the second example does not pull the data from IA.

Also contains various bug fixes I stumbled across while testing these scrapers

Let me know if the examples listed in CONTRIBUTING.md looks okay

josh-chamberlain commented 1 year ago

This does what it's supposed to. @mbodeantor if this looks good to you, we can close all our "repo overhaul" issues!

mbodeantor commented 1 year ago

Should the "configs" in this else block be "configs_file"? Seems like it is the same as the if block otherwise: https://github.com/EvilDrPurple/PDAP-Scrapers/blob/77d796ced7b8d1df21ea7e160bc75c44f12d1e11/scrapers/data_portals/crimegraphics/crimegraphics_clery.py#L43

EvilDrPurple commented 1 year ago

@mbodeantor

Should the "configs" in this else block be "configs_file"? Seems like it is the same as the if block otherwise: https://github.com/EvilDrPurple/PDAP-Scrapers/blob/77d796ced7b8d1df21ea7e160bc75c44f12d1e11/scrapers/data_portals/crimegraphics/crimegraphics_clery.py#L43

It is my understanding that this if-else block is here for backwards compatibility of this function. In previous versions, a config.py file was sent to the function through the config parameter. At some point this was reworked to instead be a dictionary sent through the config parameter. The config_file parameter is a boolean that is True if the config being sent is using the old functionality, and False if it is using the new functionality. The if-else block is there to avoid an error due to dot notation being invalid for dictionaries, and bracket notation being invalid for the conifg.py files. There's likely a better way to do this (i.e. checking the type of config being passed instead of using the boolean) but removing the config_file parameter will require us to update old scrapers relying on the old functionality, and at that point we may as well switch them all over to using the dictionary instead of a config file.

mbodeantor commented 1 year ago

Ah okay, thanks!