divergentdave / tenforty

Tools for analyzing U.S. taxes
Other
1 stars 0 forks source link

Researching MeF schemas #5

Open divergentdave opened 7 years ago

divergentdave commented 7 years ago

This issue is for stashing stuff I figure out from the IRS MeF schemas.

Each of the "Drop" zip files contains some of the schema files multiple times, nested a few layers deep in zips. A cursory inspection showed that efileTypes.xsd shows up with different newlines and with a release status of either "FinalRelease" or "Final Release". For now I'm going to ignore this and just use everything in the nested zip file for the 1040 form series.

In the annotation/documentation for some elements in the schema, there is an "ELFFieldNumber" tag, with contents of "NL" or one or more four digit numbers (separated by spaces). Googling indicates that this is a cross-reference to IRS-internal field numbers from a pre-MeF system. The correspondence between ELF field numbers and MeF XML elements is neither injective nor onto. Where MeF allows an element to appear multiple times, the previous system would assign sequential field numbers to each. I think "NL" is used for elements that don't have a corresponding ELF field number for various reasons. In short, these numbers won't be useful for this project, and can be ignored.

divergentdave commented 7 years ago

The schemas don't directly provide information on which forms require inputs from other forms. Each schema just xsd:includes schemas from the Common folder, mostly efileTypes.xsd. Instead, relations between different forms are enforced by Business Rules. There are plenty of rules that require forms to be attached, conditional on values of certain lines. Other rules require that two particular lines on different forms are equal, if both are present.

divergentdave commented 7 years ago

Far more currency-typed fields are of integer types (i.e. USAmountType, thousands of fields) than of decimal types (i.e. USDecimalType, occurs once on Form T). I'll be ignoring rounding for now. Carrying everything to machine floating point precision will make stuff like sensitivity analysis possible, and generally keep everything as linear as possible.