fedspendingtransparency / fedspendingtransparency.github.io

Federal Spending Transparency
http://fedspendingtransparency.github.io
Creative Commons Zero v1.0 Universal
54 stars 115 forks source link

Feedback: Data Act Schema v0.1 #25

Closed kaitlin closed 8 years ago

kaitlin commented 9 years ago

This is the place to leave feedback on v0.1 of the DATA Act Schema. You can read more about the schema here: http://fedspendingtransparency.github.io/data-exchange-standard/ Federal Spending Collaboration home page: http://fedspendingtransparency.github.io/

HerschelC commented 9 years ago

Section 1, Background, for data transmission to external stakeholders, the provision of a standard API for access should be paramount. There should be one consistent, single interface to extract transactional data from all agencies. We should not have to write code to parse CSVs from one agency and then JSON for the same data from another agency and then XBRL for the same data from another. A single, consistent interface should be applied for the provision of similar data from all agencies for a given transfer method (small batch/transactional, bulk download, online presentation, etc.) The draft seems to indicate this is the intent, to allow matching of format to need, but it could be clarified that the intent is a single interface applicable across all agencies for a given method/transfer need. There should be a default method.

Facilitating different transfer methods is also why considering the amount of data to be transferred is important while developing definitions. Related to the conversation on NAICS codes - failing to look at every opportunity to optimize performance in data transfer would require more methods to handle different transfer methods. Tight technical definitions minimize the amount of data being transmitted over the pipe.

HerschelC commented 9 years ago

At some point the DATA Act Schema Model was several pages and had 5 Sections. Now when I click the link above I get just page 1, Background. What happened to the other pages? The data exchange standard link on the home page http://fedspendingtransparency.github.io/ is also wrong.

bsweger commented 9 years ago

@HerschelC Thanks for letting us know about the truncated schema model document--that happened inadvertently when we updated some wording in the Background section. Apologies for any confusion. The data standard link should be working now too. We appreciate your continued engagement in this process!

dataconsultant commented 9 years ago

The definition of "Meta-data" in the Appendix seems rather limited to the concept of data labels. They are certainly part of meta-data but the concept is much larger than that.

HerschelC commented 9 years ago

For the next iteration, with the understanding that this is version 0.1, compiling a complete schema or set of schemas would be helpful along with better name of the schema zip and/or documentation of the schema.zip in the word document. This will aid newcomers to the conversation by more quickly orienting them between the financial assistance, contracts and loans schemas and the associated schema definitions.

Further, provision of actual (anonymized) data is also helping in aiding understanding of the schema. It’s understood that pilots are underway. Provide some data along with the schema while obscuring names to protect the innocent.

HerschelC commented 9 years ago

How will master/reference data be provided? For example, a hierarchy of the federal government from the three branches down to the lowest office level (breakout of agencyid); or super-large contractor down through all the divisions and groupings to local office locations?

This information is vital for providing a clear interpretation of the underlying data. It's just a compilation of 1's and 0's without robust descriptive data.

Data elements are great – but the relationship between elements is necessary to enable analysis for value realization. A common set of reference data also mitigates the risk of ambiguity in analysis. E.g., different groups using the same base data to tell different stories. Additional dimensions such as the government calendar (accounting periods) and geographic data are similarly important to provide. All of which should be provided in machine-readable form.

HerschelC commented 9 years ago

I see this caveat many times:

Important: The draft schema does not constitute as official USSGL guidance and should not be used as official guidance by federal agencies or the public. For official guidance, see http://tfm.fiscal.treasury.gov/v1/supplements/ussgl.html.

At what point will this become USSGL official guidance in relationship to DATA Act implementation timelines? What is the implementation/policy relationship between the two? Will we have to go into the implementation phase working off of draft USSGL guidance?

HerschelC commented 9 years ago

Descriptions of the data elements require greater detail in general. availableTypeCode for example, “This is a component of the TAS. Identifies no-year TAS (X), clearing/suspense TAS (F), and default TAS (C). This field is blank for TAS that have periods of availability and unavailable receipt TAS.”

What is a no-year TAS, a clearing TAS, a default TAS? Don’t assume a level of understanding when developing definitions. If defined elsewhere, please provide a footnote to link the reader to the definition.

The schema definition should also make liberal use of annotation to embed meanings within the definition. I didn't notice any present. Code doc is really nice. (Automated production of definition documentation is even better.)

HerschelC commented 9 years ago

A very important facet of meta data is a rating of data quality based upon conformance to defined business rules. Data defects, violations of these rules, must be reported to enable fact-based decision making. The accuracy of decisions is based upon the accuracy of the data informing those decisions.

While it is not expected that data will be maintained at 100% accuracy, the meta data should give indication as to the quality of the data. Consumers of information may then choose to set a threshold for quality when selecting what data to use based upon the potential ramification of the decisions being made. Information consumers are empowered to determine fitness of data for a particular use.

CB2 commented 9 years ago

Hi all, we've leveraged the Schema Instance (XML version) v0.1 provided and NIEMified it (i.e. produced a NIEM conformant representation). Our hope is to demonstrate how the Data Act effort can be applied outside the federal government, following the principals of transparency and collaboration!

NIEM-conformant schemas representing types and elements required per the DATA Act can be found at https://github.com/NIEM/DATA-Act

Its a work in progress and includes the United States Standard General Ledger (USSGL) only. More will be added in the coming weeks. Included in the schema are URLs to the authoritative guidance for each element (i.e. OMB Circ. No. A-11, etc).

Hope this helps!

P.S. Maybe we could include a third bullet on the Data Exchange Standard page that links to this?

dataconsultant commented 9 years ago

As a non-technical but tech-savvy person, I still have no clue what much of this means. I hope to receive the layman's version of all of this at some point. Transparency becomes less transparent to the degree you need a master's degree in computer science to understand how to use and load the data.

kaitlin commented 8 years ago

closing since we are now on v0.7