As an API user, I want an average query response time of 1 second for q=* queries

@jordanpadams commented on Sun Apr 18 2021

Motivation

...so that I can ensure usability of the API through rapid responses to queries

Additional Details

1 second is somewhat arbitrary but loosely taken from https://www.nngroup.com/articles/response-times-3-important-limits/ Other details for the requirement:

Registry should contain a minimum of 1mil products for sufficient testing
Time starts from query received by API service

Acceptance Criteria

Given a deployed API and registry with 1mil+ products ingested When I perform a request or query against any endpoint with a query of q=* Then I expect an average 1 second response time, regardless of the type of response type (e.g. pds4+json, json, etc.)

Note: per the performance note, this should be tested against all endpoints and all response formats.

Engineering Details

Once #13 is implemented, this may just be a simple regression test we add to the repo to check this. Or we can talk to folks on the team to figure out if we know of any long-running queries that may push this. right now, I can't think of any.

@al-niessner commented on Thu Apr 22 2021

Average in what sense? If I do the query "q=*" a million times versus the 10 times anybody does all other searches and it takes 3 days to return the entire data base then how can the average that must be 3 days (million dominates 10) turn into 1 second?

Are you really trying to say that the user should be notified that this is going to take a while if it goes over one second? Are you saying that very targeted searches, with targeted meaning very few return values, should be a second if local (as in does not have to ask yet another node)? If, as stated in the acceptance criteria, all searches must average down to 1 second, then it would require enumerating all search variations, timing them, then averaging them and improving some or worsening others to get to 1 second. All search possiblities is a very large space to enumerate. Do you mean free and available service or one bogged down doing the million searches from above?

When does the time start? Does it start once the registry receives the request or when the user hits return? Presumably when the user hits return because that is who is watching the wall clock at this point. If the router, apache, firewall, postman, or carrier pigeon are slow and take longer than a minute, then do we use the wayback machine?

More serious, I like a good performance requirement but they need to be very quantitative over very limited problem spaces. There are not works best in every case solutions. If there were, we would not have a variety of sort algorithms for various situations.

NASA-PDS / registry-api