Dance-Data-Project / smith-capstone-23

MIT License
0 stars 1 forks source link

Filter out filings that are out-of-date #4

Closed ajhoekst closed 1 year ago

ajhoekst commented 1 year ago

There are filings in the archive that should not be considered in the dataset.

A company amends their Form 990

Amendments are simply a new filing that supersedes earlier filings. There is a checkbox on the form that indicates whether a filing is an amendment or not. Do companies always check this box correctly when they file an amendment? We don't know for sure.

A company files more than the Form 990

If a company has significant taxable income, they must also submit a 990T as a separate filing. Smaller non-profits might submit a simplified form called the 990EZ. We do care about 990EZ but do not care about 990T (for now!)

Change in fiscal year

If a company changes their fiscal year, they have an additional and shorter fiscal year to "bridge the gap." Depending on how they change their fiscal year, we may decide to exclude either the shorter fiscal year or the later, new fiscal year.

raevard commented 1 year ago

The branch associated with this issue doesn't have the code which Quinn wrote to make our initial dataset, because this branch was created before we merged Quinn's branch. Should I delete branch 4 then make a new branch?

raevard commented 1 year ago

Nevermind I think if I either git merge or create a PR with branch 4 as base and main as compare to, then it'll result in the changes to main being brought into branch 4, while keeping both branches main and branch 4.

raevard commented 1 year ago

Figured it out :)

raevard commented 1 year ago

(Fix this one; only maintain the most recent versions, remove filings which would have been overridden )

ajhoekst commented 1 year ago

Re: the conversation at our weekly check-in (Feb. 14)

For each company and each fiscal year, we want the filing that has the latest timestamp. If there is more than one, the later filing should have the amendment checkbox checked.

Knowing if this is fact is true will build our confidence in the dataset. When we add more filings, we know the checkbox is likely reliable. This is going to be a common situation in our dataset...companies should do something, but might not. Fortunately, we have multiple "signals" we can check.

raevard commented 1 year ago

Quinn completed this!