Closed hodgesmr closed 4 years ago
Thank you so much for suggesting this!
It does look like there are some fields in the Schedule C that should be parsed as dates and are currently getting parsed as strings. The following fields definitely fall in that category:
(note: there shouldn't be a need to have separate entries for those last two, because ^loan_incurred_date
should catch both of them)
Unfortunately, things get tricky when it comes to loan_due_date
. According to the format spec for v8.3 (included in the vendor pack available here: https://www.fec.gov/help-candidates-and-committees/filing-reports/other-filing-software/ ) the field is specified as being alpha-numeric up to 15 characters in length. In other words, NGP and others may be consistently sending YYYYMMDD
but they don't have to.
On the flip-side, deposit_acct_auth_date_presidential
is specified as being NUM-8
so we should be able to parse it as a datetime. I see that it was part of your first commit but then you removed it - I'm guessing you found a filing where that wasn't the case?
Thanks for the vendor pack link. I was working (mostly) blind here with filings I had in hand, but this should be a much better resource.
Re: deposit_acct_auth_date_presidential
I haven't actually seen a counterexample, I just took it out because I didn't have a positive example. But if the spec says NUM-8
then that should probably be included.
Re: loan_due_date
great catch.
I'll try to get an update to this PR soon to reflect this feedback.
I've incorporated the feedback (and doubled-checked it against the Format_v8.3 spreadsheet).
I also discovered a typo in mappings.json
that loan_incurred_date_original_
misspelled incurred
as inccured
(commit). I corrected that spelling, but that should also be considered a breaking change.
I'm sorry it took me so long to get back to you on this. I'm really torn on typo corrections like this. On the one hand it's obviously an error and desirable to have it corrected. On the other hand, this code is derived from an upstream source. In this case https://github.com/dwillis/fech-sources/blob/master/SchC1.csv#L14. If anyone has built a database with these column names, they will have to do a migration to use this new code.
I suppose because in this case we're changing types from strings to dates, the user would have to be doing a migration anyway. I also have no idea how many folks have built databases using these keys as column names. I suspect the number is zero.
In other words, I'll merge this now and push out a new release. Thanks for suggesting this!
Please check if the PR includes the following
[X] Tests for the changes have been added (if applicable)
[ ] Docs have been added / updated (if applicable)
What kind of change does this PR introduce?
Parse the following fields into datetime from ScheduleC:
authorized_date
date_signed
established_date
loan_due_date
loan_incurred_date
loan_inccured_date_original
What is the current behavior?
These fields are left as strings
These fields are now datetimes
Yes. If a user is expecting strings in this field, they will need to update to expect datetime objects or None.
The following fields looked like they sometimes have dates, but other times have strings, so conversion does not apply:
deposit_acct_auth_date_presidential
loan_due_date_terms
Also the
loan_incurred_date_terms
field was already parsing to a datetime, so I left that as-is.