clio-janelia / clio-store

A serverless Connectomics Storage API based on Google Firestore and other scalable services.
0 stars 0 forks source link

Serverless function to apply user "pull requests" #2

Open stuarteberg opened 3 years ago

stuarteberg commented 3 years ago

Now that we have Janelia users creating pull requests, we need to start applying them to our production segmentation.

For this round, Lowell can read the requested merges from firestore and run his usual script for applying merges. But in the near future, we'll want it running in a CloudRun function. That will enable us to allow "trusted" users to apply their own pull requests without special help from Lowell. It could also be a stepping stone to making "live merging" feasible (i.e. working directly on the DVID open node).

I think the CloudRun function will need to do all of the following:

  1. Resolve and/or report potential errors due to non-existent bodies. Consider a merge of fragment body -> target body:
    • If the fragment no longer exists, then someone else has already merged it into something. Did it get merged into the same body that the user was intending? If so, no problem. Otherwise, error.
    • If the target no longer exists, then the target has been merged to something already. We should figure out who it was merged into, and simply choose that as the new target.
  2. Avoid the following types of merges and report an error if they're encountered:
    • anchor-to-anchor
    • large-to-large (for some definition of "large" as measured in voxels)
    • large-into-small (or nonanchor-into-anchor); we should preserve the anchor IDs
  3. Merge body metadata from the segmentation_annotations instance:
    • preserve body status according to our rules
    • preserve other metadata such as cellBodyFiber or (soon) hemilineage
    • for the merged body, delete the metadata entry from segmentation_annotations
  4. (Ignored for now) In rare instances, other users' cleaves can interfere with the outcome of a merge.**

If there's anything I'm missing, please explain below.

A crucial detail which I'm not clear on yet is how we'll report errors back to the user. Will the CloudRun function have a synchronous API, thus returning only after all merges have been applied? In that case all errors could be simply collected and displayed by the caller. Alternatively, it might be a good idea to record errors (and successes) in firestore, and allow the frontend to check for those asynchronously, at a convenient time -- possibly even during a later browsing session.

Another detail we'll need to sort out is exactly how the CloudRun function will know which DVID node into which the changes should be applied -- it usually won't be the same node that the user was viewing in Clio when they created the PR (except in the "live merging" use-case).

One more detail that we might want to consider is whether or not we'll need some sort of "locking" or "librarian" feature in Clio. My gut feeling is that it will be simpler to take an approach that doesn't try to prevent errors, but merely reports them.

Side note: This discussion will have implications for the UI, but I'm hoping to avoid detailed UI discussions in this thread. We should think about the type of information the UI needs, but not yet try to settle on exactly how the UI will present it.

cc @DocSavage @umayaml @hubbardp @tingzhao


**Footnote regarding item 4:

Suppose a user merges a twig onto a big branch in body A, but meanwhile that branch was cleaved away (it became its own body, B). If we blindly execute our users' merge request, it will end up as a floating twig associated with body A, rather than the intended branch (B).

In theory, we could detect such problems before executing the merges, but I think we shouldn't bother. Since we're finished with the bulk of our cleaving work, we don't expect this scenario to occur very often. I'm already planning to implement volume-wide consistency checks, including one to scan the volume for discontiguous bodies. It will be relatively easy to spot problems like this and deal with them on a case-by-case basis. Is suspect they will be quite infrequent.

DocSavage commented 3 years ago

Will add a POST and GET endpoint for pull request status as well. So potentially external merge systems can update the pull request status with error, completed with warnings, or completed without warnings.

Per our meeting, @umayaml and @stuarteberg will work out the merge request checks before applying them to DVID server.