esonderegger / fecfile

a python parser for the .fec file format
https://esonderegger.github.io/fecfile/
Apache License 2.0
44 stars 13 forks source link

Schedule C date types #20

Closed hodgesmr closed 4 years ago

hodgesmr commented 4 years ago

Parse the following fields into datetime from ScheduleC:

These fields are left as strings

These fields are now datetimes

Yes. If a user is expecting strings in this field, they will need to update to expect datetime objects or None.

The following fields looked like they sometimes have dates, but other times have strings, so conversion does not apply:

Also the loan_incurred_date_terms field was already parsing to a datetime, so I left that as-is.

esonderegger commented 4 years ago

Thank you so much for suggesting this!

It does look like there are some fields in the Schedule C that should be parsed as dates and are currently getting parsed as strings. The following fields definitely fall in that category:

(note: there shouldn't be a need to have separate entries for those last two, because ^loan_incurred_date should catch both of them)

Unfortunately, things get tricky when it comes to loan_due_date. According to the format spec for v8.3 (included in the vendor pack available here: https://www.fec.gov/help-candidates-and-committees/filing-reports/other-filing-software/ ) the field is specified as being alpha-numeric up to 15 characters in length. In other words, NGP and others may be consistently sending YYYYMMDD but they don't have to.

On the flip-side, deposit_acct_auth_date_presidential is specified as being NUM-8 so we should be able to parse it as a datetime. I see that it was part of your first commit but then you removed it - I'm guessing you found a filing where that wasn't the case?

hodgesmr commented 4 years ago

Thanks for the vendor pack link. I was working (mostly) blind here with filings I had in hand, but this should be a much better resource.

Re: deposit_acct_auth_date_presidential I haven't actually seen a counterexample, I just took it out because I didn't have a positive example. But if the spec says NUM-8 then that should probably be included.

Re: loan_due_date great catch.

I'll try to get an update to this PR soon to reflect this feedback.

hodgesmr commented 4 years ago

I've incorporated the feedback (and doubled-checked it against the Format_v8.3 spreadsheet).

I also discovered a typo in mappings.json that loan_incurred_date_original_ misspelled incurred as inccured (commit). I corrected that spelling, but that should also be considered a breaking change.

esonderegger commented 4 years ago

I'm sorry it took me so long to get back to you on this. I'm really torn on typo corrections like this. On the one hand it's obviously an error and desirable to have it corrected. On the other hand, this code is derived from an upstream source. In this case https://github.com/dwillis/fech-sources/blob/master/SchC1.csv#L14. If anyone has built a database with these column names, they will have to do a migration to use this new code.

I suppose because in this case we're changing types from strings to dates, the user would have to be doing a migration anyway. I also have no idea how many folks have built databases using these keys as column names. I suspect the number is zero.

In other words, I'll merge this now and push out a new release. Thanks for suggesting this!