18F / privacy-tools

GSA PII Dashboard
https://cg-9341b8ea-025c-4fe2-aa6c-850edbebc499.app.cloud.gov/site/18f/privacy-dashboard/
MIT License
2 stars 4 forks source link

SORN scraper improvements #3

Closed ondrae closed 4 years ago

ondrae commented 4 years ago

What

Improve the SORN scraper by having it look for these alternative headlines to a few sections.

Just under half of GSAs SORNs use the following for the header of the Routine Use section ROUTINE USES OF THE RECORDS IN THE SYSTEM, INCLUDING TYPES OF USERS AND THE PURPOSES OF THE USES:

A handful use PURPOSE(S) or PURPOSES: for that section.

Why

To have our SORN scraper scrape more data. We filled in the missing data by hand for not.

How

Have the scraper method accept multiple possible headings. I abstracted it into a single method. If it gets complicated, just have each of the get_pii() or get_system_name() methods do have their own logic around headers to look for.

nikzei commented 4 years ago

Not for now.