GSA-TTS / FAC

GSA's Federal Audit Clearinghouse
Other
20 stars 5 forks source link

Improve search to find results in `GSA_MIGRATION` fields #3372

Open jadudm opened 8 months ago

jadudm commented 8 months ago

Story

We flagged values as GSA_MIGRATION. For example, there are cluster names (which... we can't search on, so that's a bad example)... there may be entity names that are GSA_MIGRATION?

OK.

### Tasks
- [x] Determine if any searchable fields might contain `GSA_MIGRATION`
- [x] Estimate how many values in those fields are `GSA_MIGRATION`
- [ ] Propose how we do/do not make those records searchable, given that `GSA_MIGRATION` has supplanted a value.

This may be a non-issue.

Note that the API can now be used to do this kind of work.

For example, if you're asking "How many cluster_name fields have GSA_MIGRATION in them," the you could write a short script using the API:

res = response.get("https://api.fac.gov/federal_award?cluster_name=...&year=2022&report_id=...")

Additional context from #3307

Here are a few related ideas from another issue that we should fold in to the above:

  1. Search results should use our accepted/standard language for "accepted date," whatever that is. (See Slack convo.)
  2. Cog/over needs to link to a page on our static site that explains what it is, and provides a list of agency numbers and NSAC email addresses. Or, the PDF with this information should be on that page. Either way, this is useless/unusable as-is.
  3. UEI or EIN needs to actually be either.
    1. If the UEI exists, and is not GSA_MIGRATION, show that.
    2. If the UEI is GSA_MIGRATION, show the EIN.
    3. If both are GSA_MIGRATION, show GSA_MIGRATION
  4. Add the time the search took to the top of the search results. "Results: 10314, 7.2 seconds"
  5. Add 18, 17, 16 back to the year selection. The data is coming.
jperson1 commented 5 months ago

I wrote a quick script to check via the API all the fields we search on for GSA_MIGRATION. Here's what I found:

  1. Basically all general.auditee_uei from the Census is GSA_MIGRATION - which makes sense, since they didn't yet exist. We should find a good way to communicate this discrepancy to users, and consider adding a "include historical audits" checkbox or something to the UEI search. Maybe, it's fine if we just let users know that old audits don't have UEIs attached.
  2. A few thousand emails (auditee and auditor) are set to GSA_MIGRATION. These are hit by the name search, but I'm not sure if we need to do anything about them.
  3. A few thousand federal_awards.federal_award_extension fields are set to GSA_MIGRATION. This is the "345" part of an ALN such as "12.345", and it may need to be handled differently in summary reports.
  4. A few thousand passthrough.passthrough_name fields are set to GSA_MIGRATION. Maybe, we want an "include unknown names" checkbox or similar. But, like with UEIs, it may also be appropriate to just let users know that some names are missing/broken and were replaced with GSA_MIGRATION.

All together, it looks like not many searchable fields were affected by the migration. Most fields have no instances of GSA_MIGRATION. Here's the direct output of the script:

Section general:
        auditee_uei count: 228313
        auditee_ein count: 0
        auditee_state count: 0
        entity_type count: 0
        cognizant_agency count: 0
        oversight_agency count: 0
        auditee_contact_name count: 0
        auditee_certify_name count: 0
        auditee_email count: 1246
        auditee_name count: 0
        auditor_contact_name count: 0
        auditor_firm_name count: 0
        auditor_email count: 1143
        auditor_firm_name count: 0
Section additional_ueis:
        additional_uei count: 0
Section additional_eins:
        additional_ein count: 0
Section federal_awards:
        federal_agency_prefix count: 0
        federal_award_extension count: 2577
        is_direct count: 0
        is_major count: 0
Section findings:
        type_requirement count: 0
        is_modified_opinion count: 0
        is_other_findings count: 0
        is_material_weakness count: 0
        is_significant_deficiency count: 0
        is_other_matters count: 0
        is_questioned_costs count: 0
        is_repeat_finding count: 0
Section passthrough:
        passthrough_name count: 3968
jperson1 commented 4 months ago

Here are some changes we might make, depending on what might be useful to users and what might be "too much". These are all implemented in branch jp/search-gsa-migration-fields, so one could mess around with them if they wanted.

  1. auditee_uei - We can include a flag ("Include missing UEIs?") This appends "GSA_MIGRATION" to the UEI search.
    • This means that JUST selecting the flag and including some UEIs will bring back any and all audits with UEI "GSA_MIGRATION", which might be a lot more than expected if a user doesn't further narrow their search
    • Combining this with other fields can be useful. But, I'd wager that if the user is including other fields, they can already narrow their search down far enough without including the UEI fields.
  2. passthrough_name - We can include a similar flag ("Include missing names?") This also just appends "GSA_MIGRATION" to the name search.
    • In this case, there are few enough passthroughs with missing names that it may be helpful to include. I think it's also more likely that a user makes use of another search field alongside this one, so it doesn't get as many spam results.
  3. federal_award_extension - When searching on a full ALN, we can always include the "GSA_MIGRATION" extensions, since we don't know for sure what they are.
    • For example, searching on ALN 12.600 gives 219 resulting reports. By including the placeholder extension "GSA_MIGRATION", we see 389 resulting reports.
    • This is especially helpful because a user can't intuitively search on 12.'GSA_MIGRATION' or something. Currently there's no way to get these audits when searching on a full ALN. One has to search for ALN 12 and then filter out everything they don't want.

image

danswick commented 4 months ago

When the final ~3000 records make it to dissemination, there will likely be more GSA_MIGRATION fields. We probably also want to tie this work into a larger story around migration in general.