18F / fbopen

[DEPRECATED] An open API server, data import tools, and sample apps to help small businesses search for opportunities to work with the U.S. government.
Other
101 stars 45 forks source link

Missing FBO entry #117

Closed smcclana closed 10 years ago

smcclana commented 10 years ago

I noticed an open posting on FBO.gov (Solicitation Number: M00681-14-t-0049, link below) which did not show up on FBOpen. When I checked the nightly XML file for the modification date of the PRESOL on 20140617, I could not find that entry in the XML file either. Perhaps this is an issue with the FTP XML file and not the parser? Are there instances where the FTP XML files do not always capture all the FBO.gov listings/changes?

https://www.fbo.gov/index?s=opportunity&mode=form&tab=core&id=67406d469c0ae752bd89aa953a0ac694&_cview=0

By the way, I really like the work you've done so far. Really cool project!

arowla commented 10 years ago

It's there, just a difference of case and you also have to filter to allow closed listings, as this one seems to be missing a close_dt:

http://api.data.gov/gsa/fbopen/v0/opps?q=solnbr:m00681-14-t-0049&api_key=DEMO_KEY&show_closed=true

arowla commented 10 years ago

Our current version also has a solnbr_ci field for case-insensitive search by solicitation no., so you can also get it like so:

http://api.data.gov/gsa/fbopen/v0/opps?q=solnbr_ci:M00681-14-t-0049&api_key=DEMO_KEY&show_closed=true

smcclana commented 10 years ago

Thanks for the information. That was perhaps a bad example, because that solicitation had an earlier notice which is showing up, it was the later MOD that wasn't showing up. Here's a better example using solicitation number FA8518-14-R-31374:

This is the FBO notice (https://www.fbo.gov/index?s=opportunity&mode=form&id=d66cddd9bd071eed219ccee7549b2607&tab=core&_cview=1)

Which is not shown in the XML file (ftp://ftp.fbo.gov/FBOFeed20140619)

So this also shows no results: http://api.data.gov/gsa/fbopen/v0/opps?q=solnbr_ci:FA8518-14-R-31374&api_key=DEMO_KEY&show_closed=true

This isn't a bug with the scripts as it doesn't appear to be in the XML file in the first place. Is there a disconnect between the FBO notices online and the XML files? Thanks again!

smcclana commented 10 years ago

I've been stepping through many of the FBO results and comparing it with the entries in the respective FTP XML file, and it seems sometimes the "Solicitation (Modified)" entries don't show up in the XML file but are available on the FBO.gov site. If it helps to identify a pattern, I could provide a list of more examples which have entries on the FBO.gov site but have no equivalent entry in the FTP XML file.

arowla commented 10 years ago

It would definitely help to identify a few more where the MODs are not showing in the files. I'm wondering if there could be a cutoff point at some point in the day where that evening's MODs will show up in the next day's file.

That said, the current (production deployed) version of FBOpen does not support MODs, while our upcoming Elasticsearch version will... so they won't be showing up on our API until we've announced our new release.

As for the FA-8518[..] example, that one is a sole-source solicitation, and therefore it also gets filtered out by default. By adding show_noncompeted=true, we can see the PRESOL in our API:

http://api.data.gov/gsa/fbopen/v0/opps?q=solnbr_ci:FA8518-14-R-31374&api_key=DEMO_KEY&show_closed=true&show_noncompeted=true

smcclana commented 10 years ago

I will make a list and note any patterns. How do you know it is a sole-source solicitation? I didn't see anything in the XML file or the way the JSON file gets made which indicates that. Sorry for all the questions, I'm new this project and trying to get a handle on it. Thanks for the information!

arowla commented 10 years ago

Sole-source is indicated in the long-form description text. If you look at api/app.js, you can see the criteria for the show_noncompeted filter.

https://github.com/18F/fbopen/blob/master/api/app.js#L127

On Fri, Jun 20, 2014 at 5:44 PM, smcclana notifications@github.com wrote:

I will make a list and note any patterns. How do you know it is a sole-source solicitation? I didn't see anything in the XML file or the way the JSON file gets made which indicates that. Sorry for all the questions, I'm new this project and trying to get a handle on it. Thanks for the information!

— Reply to this email directly or view it on GitHub https://github.com/18F/fbopen/issues/117#issuecomment-46730006.

Alison Rowland Technical Lead, https://fbopen.gsa.gov | 18F 202-317-0124

smcclana commented 10 years ago

It seems there is a cut-off after 19:30. All the following were posted on or after 19:30 and not observed within the FTP XML file. There are several postings in the XML file for notices with a posted time of 19:29, so it appears 19:30 is a cut-off period. I also did not see these "missing" notices within the next day's (6-21-14) XML file either. So far I cannot find any record of these missing notices anywhere in the XML files.

It also appears that if multiple MOD's are made within the same day that only the latest MOD is captured within the XML file, making the earlier MOD's for that day "missing" from the XML file as well.

The following posts where from 6-20-2014 and are observed on the FBO gov site but not within the FTP XML file: ftp://ftp.fbo.gov/FBOFeed20140620

This first one is interesting, because it had several modifications on 6-21-14, the last one of which happened at 21:07. Even given the multiple earlier MOD's during that day, there is no mention of this solicitation in the XML file. HHSM-500-2014-RFP-QINNCC https://www.fbo.gov/index?s=opportunity&mode=form&id=4e6c6b12acd457b1dbd8cbf0e17a46d8&tab=core&_cview=1

W91238-14-Q-0040 https://www.fbo.gov/index?s=opportunity&mode=form&id=22b1a28e16a28e572187e4146e8cd762&tab=core&_cview=1

SPRRA2-14-T-0053 https://www.fbo.gov/index?s=opportunity&mode=form&id=671b8d0360fdc6c6161d90a8ea7de681&tab=core&_cview=1

W56KGU14R0002 https://www.fbo.gov/index?s=opportunity&mode=form&id=ca27c6a92395221ea4bb92788341916e&tab=core&_cview=1

W56HZV13T0269 https://www.fbo.gov/index?s=opportunity&mode=form&id=8badb4a27dc7fec7d9f1dfc69854327e&tab=core&_cview=1

AG-03R6-S-14-0057 https://www.fbo.gov/index?s=opportunity&mode=form&id=f16325bae4690acccab385ba87ab4c9c&tab=core&_cview=1

arowla commented 10 years ago

Thanks for doing this research! It's starting to sound like a bug in FBO's bulk data system. It would be interesting to see if the weekly XML dump suffers from the same problems.

smcclana commented 10 years ago

I will close this as it isn't an issue with any code on this project.