Improve the SORN scraper by having it look for these alternative headlines to a few sections.
Just under half of GSAs SORNs use the following for the header of the Routine Use section
ROUTINE USES OF THE RECORDS IN THE SYSTEM, INCLUDING TYPES OF USERS AND THE PURPOSES OF THE USES:
A handful use PURPOSE(S) or PURPOSES: for that section.
Why
To have our SORN scraper scrape more data. We filled in the missing data by hand for not.
How
Have the scraper method accept multiple possible headings. I abstracted it into a single method. If it gets complicated, just have each of the get_pii() or get_system_name() methods do have their own logic around headers to look for.
What
Improve the SORN scraper by having it look for these alternative headlines to a few sections.
Just under half of GSAs SORNs use the following for the header of the Routine Use section
ROUTINE USES OF THE RECORDS IN THE SYSTEM, INCLUDING TYPES OF USERS AND THE PURPOSES OF THE USES:
A handful use
PURPOSE(S)
orPURPOSES:
for that section.Why
To have our SORN scraper scrape more data. We filled in the missing data by hand for not.
How
Have the scraper method accept multiple possible headings. I abstracted it into a single method. If it gets complicated, just have each of the
get_pii()
orget_system_name()
methods do have their own logic around headers to look for.