Police-Data-Accessibility-Project / scrapers

Code relating to scraping public police data.
https://pdap.io
GNU General Public License v3.0
157 stars 33 forks source link

(Don't Fear the) Repo overhaul #197

Closed josh-chamberlain closed 10 months ago

josh-chamberlain commented 1 year ago

Scrapers repo

These serve as examples of different ways to access data. They're also individually useful.

Problems with the current repo

How people use the repo

To do

Readme changes

Issue adjustment

Structure changes

setup_gui/Base_Scripts/Scrapers/crimegraphics/crimegraphics_bulletin.py

common/base_scrapers/crimegraphics/crimegraphics_bulletin.py

Base_Scripts/Scrapers/crimegraphics/crimegraphics_bulletin.py
CODE_OF_CONDUCT.md
CONTRIBUTING.md
LICENSE.md
README.md
requirements.txt
examples_templates/
   -- scraper_template/
     -- README.md
     -- scraper.py
   -- scraper_example_1/
      -- README.md
      -- scraper.py
   -- etc
scrapers/
    -- data_portals/
        -- cityprotect/
        -- crimegraphics/
             -- README.md
             -- crimegraphics.py
    -- federal/
    -- AR/
    -- CA/
    -- FL/
        -- scraper/
        -- county/
            -- scraper/
                -- scraper.py
                -- README.md
            -- municipality/
                -- scraper/
                    -- scraper.py
                    -- README.md
    -- etc
utils/
  -- meta/
    -- all_fields_extractor/
    -- etc
  -- setup_gui/
  -- etc

Related work

https://github.com/Police-Data-Accessibility-Project/PDAP-Scrapers/issues/196

EvilDrPurple commented 11 months ago

Just want a clarification real quick, you put:

scrapers/
    -- data_portals/
    -- cityprotect/
    -- crimegraphics/
         -- README.md
         -- crimegraphics.py
  -- federal/

You indented this more than the other lines so I'm assuming you want what is currently the top-level Base_Scripts directory to go inside the scrapers folder, or did you want them inside the utils folder instead?

one major reason for this is that the "common" scripts are all over the place, so for a new user it's incredibly difficult to figure out how they relate.

Based on this line I'm also assuming you want some of the duplicate scripts in various places removed and streamlined to one area. Just want to make sure this is what you meant

josh-chamberlain commented 11 months ago

@EvilDrPurple yes, the duplicate scripts should each have one home. I grouped some of them under data_portals/ and some under utils/ because there are both kinds, and they do different things—data portal scrapers can be run on any portal built on that platform, the utils do other random stuff.

I know it might be a pain taking something that's currently spread across 3 random directories and refactoring it to work in just 1. As you work, if you find things that are broken when you get there, let's talk about whether it's worth fixing for the sake of reorganizing the repo vs. just pulling it out / putting it on ice somehow.