18F / data-act-pilot

This small DATA Act pilot contains code that translates agency data to a uniform DATA act format.
Other
21 stars 14 forks source link

SAM Data #74

Closed HerschelC closed 9 years ago

HerschelC commented 9 years ago

What data are you pulling from SAM?

I ask because I am thinking about socio-economic tags for awards. Classifications like small business, 8a or HUBZone are based upon when the proposal was submitted / part of the award information. This status has no bearing on past awards (once award is made, it stays associated with the socio-economic status). So the award should be reported as an 8a award even though the company may no longer be an 8a.

It's actually a very interesting story to see how many companies that have graduated from 8a still benefit from that 8a through 8a extension contracts that circumvent the 8a statutory limits (like 8a STARS that effectively grants 8a's another 5 years of 8a status beyond the 8 or so years they get while in the 8a program). Whatever reporting is done must be able to show status at a point in time, IMHO.

Similarly, companies move addresses - especially around DC it's easy to slip between DC, MD and VA. So do government agencies for that matter. Analyzing impacts of moves, labor force, contract awards and things like state infrastructure spend to support an agency move may be an interesting story.

bsweger commented 9 years ago

@HerschelC for this pilot, the talk around SAM has been mostly related to award recipient addresses (for grant data, which is the current scope). If you have suggestions for additional vendor-related data elements that would be useful to a broad audience, you can suggest them over on the collaboration space.

Do you see the data for this type of analysis as something that should be included in the DATA Act schema, or do you see it as something that a combination of standardized spending data and the SAM API would enable the community to undertake?

HerschelC commented 9 years ago

My preference would be to have award data in one place as much as possible rather than having to integrate large batches of data over throttled (or overloaded) APIs to perform analysis. As you know, an analytical query spans many records and isn't a transaction "one record" in and out.

I did a really quick look through the SAM API Github. I saw where the data is updated at 2am. I see "isa" fields that tell you how the entity looks now - but what we need to know is how the entity looked at a point in time - when the award was made. We want to track the impact that socioeconomic programs have on companies over time - and what happens to these companies after they graduate. What happens to the key people (executives) in these companies? This information would also be useful for fraud analysis.

bsweger commented 9 years ago

@HerschelC Thanks for answering this question. After reading this thread again, I want to encourage you to share this type of insight on the collaboration space, since that's what's being used to funnel public feedback about the schema and the data elements.

Not that I don't enjoy these conversations! I just want to make sure your thoughts are shared with the wider audience. This repo is a pretty tightly-scoped sandbox.

HerschelC commented 9 years ago

Done! :)

bsweger commented 9 years ago

Thanks @HerschelC. I'm going to close this issue since you weighed in on the collaboration space. If you have any other questions, let us know.