Closed kaitlin closed 10 years ago
I have been thinking about this a lot recently, especially as we add more datasets to FBOpen.
There are two ways we could go:
The former has the advantage that it is really quick and easy to get a new dataset into the search engine.
However, I think you're probably right, here. As we get more and more users of our API, we may want to graduate to a new API version which provides standardized names for all the fields we can standardize. This, of course, becomes an easier task to as the selection of datasets grows and we can more clearly draw the lines between common values.
We'll need to figure out how to deal with values that are not common-- standardize them arbitrarily, or keep them as-is?
I think option 1 makes sense as long as you're relying on the fbo.gov docs to tell users what each field is/means. If you have your own data dictionary at some point, then I think it makes sense to make everything as uniform as possible for both the api and frontend, and reference the original field name that appears in fbo.gov in the data dictionary.
What other datasets do you plan to add? The state level RFPs? I think that where fields have the same meaning, you should standardize arbitrarily. I can imagine someone wanting to search federal, maryland, and virginia RFPs where the deadline is more than two weeks out. So in that case the "deadline" field is the same across all data sources, even if maryland uses "due_date" for instance.
FWIW in v0 I chose a core set of important fields that do, in fact, get standardized names regardless of data source. See the "field map" in the FBO loader at https://github.com/18F/fbopen/blob/master/loaders/fbo.gov/fbo-solrize-big.js#L57and in the grants.gov loader at https://github.com/18F/fbopen/blob/master/loaders/grants.gov/grants-nightly.js#L113.
The idea was that those fields would be core to most/all data sources, although you could certainly second-guess a couple of those choices. (The choice to use "solnbr" was probably a poor one on my part.)
Other fields like CLASSCOD were so source-specific (in both FBO and grants.gov data) that it didn't seem worth standardizing them, and in some cases it might get confusing. I went the other way: not only did I leave them as-is, but I prefixed them with the data source -- hence, e.g., if you want to filter on class code you use FBO_CLASSCOD, so you know it's specific to FBO.
On Fri, Apr 4, 2014 at 5:09 PM, Kaitlin Devine notifications@github.comwrote:
I think option 1 makes sense as long as you're relying on the fbo.govdocs to tell users what each field is/means. If you have your own data dictionary at some point, then I think it makes sense to make everything as uniform as possible for both the api and frontend, and reference the original field name that appears in fbo.gov in the data dictionary.
What other datasets do you plan to add? The state level RFPs? I think that where fields have the same meaning, you should standardize arbitrarily. I can imagine someone wanting to search federal, maryland, and virginia RFPs where the deadline is more than two weeks out. So in that case the "deadline" field is the same across all data sources, even if maryland uses "due_date" for instance.
Reply to this email directly or view it on GitHubhttps://github.com/18F/fbopen/issues/36#issuecomment-39611720 .
Aaron Snow Presidential Innovation Fellow aaron.snow@gsa.gov 202-631-4667
@kaitlin, I'm thinking this ticket should probably be closed at this point, especially after our migration to using ext
to call out non-standardized fields. Any more thoughts before I close?
close it down!
On Fri, Jun 27, 2014 at 4:36 PM, Alison Rowland notifications@github.com wrote:
@kaitlin https://github.com/kaitlin, I'm thinking this ticket should probably be closed at this point, especially after our migration to using ext to call out non-standardized fields. Any more thoughts before I close?
— Reply to this email directly or view it on GitHub https://github.com/18F/fbopen/issues/36#issuecomment-47397257.
This might seem nitpicky, but I think FBOpen is a good opportunity to get away from the crazily abbreviated fields of fbo.gov (i.e. CLASSCOD, solnbr, offadd). Renaming these to "class_code", "solicitation_number" and "office_address" would just make it more readable to people who haven't already used fbo.gov a bunch. It seems like someone had this same thought when they used "Opportunity Number" on the FBOpen frontend site instead of "solnbr".