Open tokebe opened 3 months ago
PRs are functionally done but I am still working on updating the tests
@rjawesome Have you been able to do any load testing on the queuing system? Nothing incredibly rigorous is needed -- a simple test showing that it can handle a number of simultaneous creative mode queries without locking up/becoming unresponsive would be sufficient.
@rjawesome Have you been able to do any load testing on the queuing system? Nothing incredibly rigorous is needed -- a simple test showing that it can handle a number of simultaneous creative mode queries without locking up/becoming unresponsive would be sufficient.
It seems to be working fine with 3 creative mode queries at once.
@tokebe Just to confirm would you like me to move the changes from #824 into these branches?
For now, no. #824 needs additional benchmarking that I'll handle.
@rjawesome looks like you missed a couple things:
taskHandler.ts
, line 62, you missed the change from { threadId, registerId: true }
to { threadId, type: "registerId" }
, which leads to warnings and potentially undefined behavior if the threadId is needed from the job data.toWorker / parentPort
and fromWorker
are supposed to be renamed to to workerSide
and parentSide
, respectively, to improve clarityThis implies there may be other changes that might have slipped by -- can you fix these and re-review to ensure no other desired changes were skipped? Feel free to ping me if it's unclear whether a change is desired for the backport or not.
@tokebe The changes you stated are fixed. I am taking a second look at all the changes. While looking at the commits, I had a few things to check:
options.submitter
(or maybe I missed where it was used), does this still need to be used in BTE? (presumably by calling addSubmitter
on a TRAPIQuery
)@tokebe While looking at more commits, I noticed a possible bug in Retriever with the max records per edge/query. In the edge manager, it uses the max records per edge environment variable when determining max records per query (https://github.com/BioPack-team/retriever/blob/3d306adee4ca77c9d591978138fbe74d5656cedf/packages/query_graph_handler/src/edge_manager.ts#L18). Further, the defaults in the edge manager seem to differ from what is specified in the documentation (https://github.com/BioPack-team/retriever/blob/main/docs/REFERENCE.md). If I am supposed to integrate this into BTE, how should I set up the defaults?
One more question
You're right, it should instead use MAX_RECORDS_TOTAL
. For now, use the defaults defined in the code rather than the documentation.
You're correct, cache handling should be removed from thread messaging. I believe this includes cacheInProgress
, addCacheKey
, completeCacheKey
, and cacheDone
.
@tokebe All merge conflict should be resolved now.
The Retriever prototype in BioPack has a few experimental improvements over BTE which should be brought back to BTE.
The general list of changes we should backport are as follows:
These changes are spread across much of the codebase, but should be relatively trackable via commits thanks to Retriever being a single repo. Primary changes are in the thread handler, call-apis package, and new utils caching file, respectively. Along with this, a new graph package was split off, as well as numerous changes in the types package, most of which are likely necessary for the changes to the primary packages to work.
For this backporting, we don't want to worry about any name changes. Additionally, feature removals (such as removing inferred mode) aren't desired -- those were purely for Retriever's use case. It's probably easiest to manage by going chronologically up the new commits, starting from where the packages were added locally, and working changes back into the appropriate BTE packages while ignoring any change that appears to be just a renaming or feature removal.