biothings / biothings_explorer

TRAPI service for BioThings Explorer
https://explorer.biothings.io
Apache License 2.0
10 stars 11 forks source link

Backport Retriever prototype improvements #825

Open tokebe opened 3 months ago

tokebe commented 3 months ago

The Retriever prototype in BioPack has a few experimental improvements over BTE which should be brought back to BTE.

The general list of changes we should backport are as follows:

These changes are spread across much of the codebase, but should be relatively trackable via commits thanks to Retriever being a single repo. Primary changes are in the thread handler, call-apis package, and new utils caching file, respectively. Along with this, a new graph package was split off, as well as numerous changes in the types package, most of which are likely necessary for the changes to the primary packages to work.

For this backporting, we don't want to worry about any name changes. Additionally, feature removals (such as removing inferred mode) aren't desired -- those were purely for Retriever's use case. It's probably easiest to manage by going chronologically up the new commits, starting from where the packages were added locally, and working changes back into the appropriate BTE packages while ignoring any change that appears to be just a renaming or feature removal.

rjawesome commented 3 months ago

PRs are functionally done but I am still working on updating the tests

tokebe commented 2 months ago

@rjawesome Have you been able to do any load testing on the queuing system? Nothing incredibly rigorous is needed -- a simple test showing that it can handle a number of simultaneous creative mode queries without locking up/becoming unresponsive would be sufficient.

rjawesome commented 2 months ago

@rjawesome Have you been able to do any load testing on the queuing system? Nothing incredibly rigorous is needed -- a simple test showing that it can handle a number of simultaneous creative mode queries without locking up/becoming unresponsive would be sufficient.

It seems to be working fine with 3 creative mode queries at once.

@tokebe Just to confirm would you like me to move the changes from #824 into these branches?

tokebe commented 2 months ago

For now, no. #824 needs additional benchmarking that I'll handle.

tokebe commented 2 months ago

@rjawesome looks like you missed a couple things:

This implies there may be other changes that might have slipped by -- can you fix these and re-review to ensure no other desired changes were skipped? Feel free to ping me if it's unclear whether a change is desired for the backport or not.

rjawesome commented 2 months ago

@tokebe The changes you stated are fixed. I am taking a second look at all the changes. While looking at the commits, I had a few things to check:

tokebe commented 2 months ago
rjawesome commented 2 months ago

@tokebe While looking at more commits, I noticed a possible bug in Retriever with the max records per edge/query. In the edge manager, it uses the max records per edge environment variable when determining max records per query (https://github.com/BioPack-team/retriever/blob/3d306adee4ca77c9d591978138fbe74d5656cedf/packages/query_graph_handler/src/edge_manager.ts#L18). Further, the defaults in the edge manager seem to differ from what is specified in the documentation (https://github.com/BioPack-team/retriever/blob/main/docs/REFERENCE.md). If I am supposed to integrate this into BTE, how should I set up the defaults?

One more question

tokebe commented 2 months ago

You're right, it should instead use MAX_RECORDS_TOTAL. For now, use the defaults defined in the code rather than the documentation.

You're correct, cache handling should be removed from thread messaging. I believe this includes cacheInProgress, addCacheKey, completeCacheKey, and cacheDone.

rjawesome commented 1 week ago

@tokebe All merge conflict should be resolved now.