Police-Data-Accessibility-Project / meta

Planning our activities with issues that don't fit in a specific repository yet.
GNU General Public License v3.0
693 stars 58 forks source link

Technical Direction + Design Tenets #95

Closed Astaltine closed 3 years ago

Astaltine commented 4 years ago

Forgive me if we already have such things, but I think it'd be good to come up with a set of design goals/high-level tenets for the various components of the system.

It can help a lot with making a choice when presented with many differing opinions on direction. Since this is a super grass-roots organization that's trying to ramp up quickly, we're bound to end up with many differing opinions on technical direction.

Things like the below. I'm only using them as examples:

It can help keep team members on the same page, and help ramp new contributors more quickly by helping them understand the high level reasons for why we might've made certain decisions.

Astaltine commented 4 years ago

From Tim Thelin on slack

I'd specifically add that PII handling has to be done with extreme care. Nothing will blow back faster than mishandling it, even if good intentions were involved
mcsaucy commented 4 years ago

What's the expected outcome of this issue? Are we shooting for a doc that owners agree upon and enforce?

Francoded commented 4 years ago

I've been doing some thinking about the design of our web scraper fleet. I think an ideal framework would provide basic utility classes for tasks such as handling case data and outputting scraped data into a common format. This allows any volunteer developer to focus only on the scraping of data for a given website. The framework can also define what a PDAP scraper must have and even provide useful (e.g. parsing) functions that would benefit any PDAP scraper.

The goal is to have minimize the amount of work a volunteer developer would need to do as they won't have to worry about how to format output, how to store data, etc.

Not sure if anyone has already put some thought into this already or if we already have a rough idea of how our scrapers would be designed/implemented. I joined last night. :)

jameskranz commented 4 years ago

I've been doing some thinking about the design of our web scraper fleet. I think an ideal framework would provide basic utility classes for tasks such as handling case data and outputting scraped data into a common format. This allows any volunteer developer to focus only on the scraping of data for a given website. The framework can also define what a PDAP scraper must have and even provide useful (e.g. parsing) functions that would benefit any PDAP scraper.

The goal is to have minimize the amount of work a volunteer developer would need to do as they won't have to worry about how to format output, how to store data, etc.

Not sure if anyone has already put some thought into this already or if we already have a rough idea of how our scrapers would be designed/implemented. I joined last night. :)

I had the same thoughts around the fact we need to have some sort of common design, so I created #92

tthelin commented 4 years ago

I agree that we need a low-coupled interface point for scrapers; some kind of common output format they can all target.

Due to PII in the scraper output, it also has to be locked down aggressively until it's passed to the next layer and PII detachment can occur.

tthelin commented 4 years ago

Also anything with PII (like scraper output) can never be checked in to a system that stores history (such as github)

We need a separate system that can store raw scraper output that can be locked down properly, and that system has to be easy for scrapers to use and dropbox into (write only effectively if possible)

Astaltine commented 4 years ago

What's the expected outcome of this issue? Are we shooting for a doc that owners agree upon and enforce?

I was hoping for some discussion. I'm planning on working up an initial draft as a PR this weekend. After that, we'd agree on a final set of tenets, and owners would help enforce them while reviewing PRs and help guide discussion.

Contributors could use them to help make decisions without needing to discuss every individual detail with everyone.

And all of us could use them to help drive discussion and agreement on direction when differing directions end up as opinion vs opinion.