DataKind-DC / CARES

US CARES Act Payment Protection Program data, cleaned for analysis
GNU General Public License v3.0
6 stars 7 forks source link

Data transposition - state column based #28

Closed emigre459 closed 4 years ago

emigre459 commented 4 years ago

This code uses the State column of the PPP dataset to identify a large transposition error in which 86% of loans have values in columns that are actually positioned two to the left of the column they actually should be in. This code identifies the problem records, flags them in a new column, then fixes them and flags them as fixed (in another new column).

review-notebook-app[bot] commented 4 years ago

Check out this pull request on  ReviewNB

Review Jupyter notebook visual diffs & provide feedback on notebooks.


Powered by ReviewNB

kbmorales commented 4 years ago

Great work! We should replicate this in the R data ingest pipe as well. Did you make a flat file for the resulting PPP data by chance? I'd love to pair that up with NAICS to see if that helps with the failed matches I found.