Closed emigre459 closed 4 years ago
Check out this pull request on
Review Jupyter notebook visual diffs & provide feedback on notebooks.
Powered by ReviewNB
Great work! We should replicate this in the R data ingest pipe as well. Did you make a flat file for the resulting PPP data by chance? I'd love to pair that up with NAICS to see if that helps with the failed matches I found.
This code uses the State column of the PPP dataset to identify a large transposition error in which 86% of loans have values in columns that are actually positioned two to the left of the column they actually should be in. This code identifies the problem records, flags them in a new column, then fixes them and flags them as fixed (in another new column).