Supporting Federal oversight via search

jadudm commented 7 months ago

Big-picture: search is a language

Search is an expression of intent, by the user, to define what they want from our database.

It is not functional or sequential; it is declarative. "I want everything accepted in the month of January, 2024, that has the agency number 23 and questioned costs." This statement says that, of all of the data:

pie title FAC DATA
  "All the data": 100

we want to limit to just January:

pie title FAC DATA
  "All the data": 88
  "January 2024": 12

and of that data, we only want agency 23 (from that subset):

pie title FAC DATA
  "All the data": 88
  "January 2024": 11
  "Agency 23": 1

and of that, we only want the audits that have questioned costs. Search is about defining the set of possibilities, so that what comes back achieves the user's intent.

Our current search is a simple, bespoke language. It allows 7 fields to be searched: year, UEI/EIN, ALN, names, acceptance date, cog/over, and state. This is depicted on the left end of a search spectrum. Bespoke means "we crafted it by hand." Things that are bespoke are expensive to maintain and typically do not scale well.

This is at the left-hand of the spectrum. At the other end of the spectrum is the API; it is infinitely scalable, as long as you can write code and run it. Most of our partners cannot write code, and cannot run arbitrary programs on their GFE. As a result, it is a powerful tool that is largely inaccessible. It is easy to maintain, easy to extend, allows a user to do anything they want with our data, but vanishingly few users can actually leverage its capabilities on a day-to-day-basis.

%%{init: { 'gitGraph': { 'mainBranchName': 'search spectrum'}} }%%
gitGraph
   commit id: "Simple, bespoke" type:REVERSE
   commit id: "Intermediate, bespoke dashboards"
   commit id: "Blockly"
   commit id: "Query builder"
   commit id: "Search language"
   commit id: "SQL in the browser"
   commit id: "API" type:REVERSE

We have more than 100 fields in our database, with complex interrelations. At the other end of the search spectrum is our API. Our existing search is not expressive enough to support IG and resolution officials' work, and our API is too complex for most users. (The change process on API adoption is likely to be years.)

In the medium-to-long-term, we need to expose more of the database to more of our users. Or, we need to understand how to better support the investigatory work they do. This is through a combination of search, export, reporting, and tooling we might develop.

Story

OIGs and resolution officials need to dig much deeper into our data than our current search allows.

Now that the GSA FAC is the FAC, and the only place for any data to be obtained, we have to provide a pathway for search that lets Federal users do their jobs.

Next increment

This increment wants to be timeboxed to now and no more than 2-3 weeks to delivered product.

These all assume that we are keeping the base set of search tools in the advanced search (e.g. by date, by cog/over, ALN, etc.), and that what follows are additions.

In priority order, we should support at least the following (expressed as questions):

Are there findings? This could be expressed in several ways. It might be a checkbox group with an "Any" (that implies that we're looking for any findings), and following it could be checkboxes for including specific kinds of findings. (These are fields like is_material_weakness in the findings endpoint). Being able to zero in on the individual booleans in that table does matter.
Is the funding direct? This comes from the federal_awards table, and is a boolean. It could be expressed as a funding type selection between "any, direct funding, passthrough funding", where "any" means we're indifferent, selecting "direct funding" means "Y", and passthrough means "N" for that field.
Is it a major program? This is a boolean, but there is an associated field that if it is a major program, it will have an opinion type. It could also be a drop-down that allows for the selection of "any" as well as the four major opinion types.
What is the type of the report or compliance requirement? This is the "alphabet soup" column. How we structure it is a question, but at some level, being able to say "I want compliance type B for ALNs X, Y, and Z" is the goal.
Entity type? Allowing the restriction to state, tribal, public, and perhaps breaking apart Tribal into tribal/public and tribal/suppressed might be appropriate. This would then include the is_public field in the search for those two breakouts. This allows agencies who do a lot of Tribal granting to quickly zero in on the audits they are trying to work with.
Can I find a specific report? We feature this everywhere, but have no way to search for it.
Can I search by fiscal year end date? We allow a search by fiscal year, but not by the fiscal year end date. This matters for determining things about timeliness. E.g. to search by 12/31 vs 6/30.
By EIN/UEI? This may want a modifier for "primary," "additional," or "both". This is used when trying to trace an auditee through time, and therefore, it might be that we want to look at only the primary UEI, or we may want to include/restrict (in the case of passthroughs) to the secondary_ueis table. (Same for EIN.)

Bonus: Simplify main search

I would recommend removing cog/over from the main search, and moving it to the advanced search. This would also remove it from the results table. I would add it to the overview page as text with a link to support. "This audit has agency 93 (HHS) as the cognizant agency." And, we can then hyperlink to documentation about what this means. (We are required, by policy, to link them to information about their NSAC contacts and their cog/over agency. This may not get picked up as part of this work, or we might do it. But it must happen at some point.)

I would then add a search by report ID to the main search. It is how we communicate with users about their submissions in the helpdesk, but we have absolutely no way for users to navigate to audits or even search for them by this identifier. If it matters so much in our communications, it should be part of the search.

### Tasks
- [ ] https://github.com/GSA-TTS/FAC/issues/3407
- [ ] Implement, test, observe, and ship compliance, entity, report_id
- [ ] Implement, test, observe, ship fy date, ein/uei primary/secondary
- [ ] Placeholder: add loading status UI
- [ ] https://github.com/GSA-TTS/FAC/issues/3478

danswick commented 5 months ago

There are some outstanding follow-up tasks here, but this epic is otherwise complete.

jadudm commented 5 months ago

@danswick , we are missing numbers 4 through 8 on the requirements list, aren't we?

What is the type of the report or compliance requirement? This is the "alphabet soup" column. How we structure it is a question, but at some level, being able to say "I want compliance type B for ALNs X, Y, and Z" is the goal.
Entity type? Allowing the restriction to state, tribal, public, and perhaps breaking apart Tribal into tribal/public and tribal/suppressed might be appropriate. This would then include the is_public field in the search for those two breakouts. This allows agencies who do a lot of Tribal granting to quickly zero in on the audits they are trying to work with.
Can I find a specific report? We feature this everywhere, but have no way to search for it.
Can I search by fiscal year end date? We allow a search by fiscal year, but not by the fiscal year end date. This matters for determining things about timeliness. E.g. to search by 12/31 vs 6/30.
By EIN/UEI? This may want a modifier for "primary," "additional," or "both". This is used when trying to trace an auditee through time, and therefore, it might be that we want to look at only the primary UEI, or we may want to include/restrict (in the case of passthroughs) to the secondary_ueis table. (Same for EIN.)

These are not done.

GSA-TTS / FAC