Open mtholder opened 8 years ago
Oh and the "few" in the title refers to the fact that we might want a "remove this study+tree" from the default list. and perhaps a "move this study+tree pair to a different collection".
I think that the add operation is the only one that is likely to be needed to assure smooth demos/workshops. removing or moving are probably rare enough, that I doubt we'll see lots of simultaneous usage.
[Paraphrased from the original issue:] There might be simpler solutions using git's own configuration options.
I'm looking into clean/smudge filters, and whether we can configure them on GitHub. (We might be able to minimize these bugs by setting this up on the api server.)
I'm also taking a quick look at word-diff
and other alternatives to the default line-wise behavior of git-diff.
OK, after some exploration with git configuration, I'm almost ready to give up. @mtholder, I know you're familiar with git internals, so maybe you'll know whether this idea is a dead end.
I've worked out how to configure a custom diff-tool and merge-tool for a local repo, which in principle allows us to work in a JSON-aware way (parsing the JSON documents to resolve differences, instead of using git's dumb line-wise algorithm). But these only work "downstream" of the initial commit logic, once a merge conflict has been detected. I can't seem to change the linewise logic used during a commit, which is important in our most common use case: two different curators each add a tree, which git interprets as a trivial series of changes, instead of two distinct trees. Thoughts?
Here's another thought on how to handle this requirement (easily adding a tree to synthesis): We keep exactly what we have, including the ^ot:candidateTreeForSynthesis
list in Nexson and its checkboxes in the curation UI. (Note that my notes here assume the synthesis inputs are drawn from a nested (n-level) set of collections.)
When it's time to run synthesis, we simply create a temporary collection (by scanning the phylesystem data for preferred trees) and append it to the other synthesis inputs. To me, this makes sense for a few reasons:
^ot:candidateTreeForSynthesis
flag. We might still add the services described above, so that we can indicate a tree that's already queued for synthesis by virtue of its inclusion in one of these collections.I like this last idea from @jimallman : it gets us what we want (an easy way for people to add trees to synthesis) with very little overhead. As part of synthesis, we can scan phylesystem and check curation status for proposed trees and add those to a collection in one operation.
I don't have a problem with this (@jimallman's suggestion to use ^ot:candidateTreeForSynthesis
) is an architectural solution.
We may face a bit of a social hurdle because (up until now) using the ^ot:candidateTreeForSynthesis
was just a flag that a curator thought the tree was ready, and another step had to be taken to actually push the tree into the synth. So right now, I think that we have 1053 studies that have this property set to at least one tree. I'm not sure how many of those trees are actually good to go. Doubling our number of input trees in the next synth run may reveal scaling problems or a confusing mess because of issues with these trees. These issues could be hard to track down with a big jump in the number of trees.
But that is a short term issue.
Presumably we can fix any scaling issues, and untoggle the ^ot:candidateTreeForSynthesis
flag for any tree that seems to be a problem.
Longer term, I think we may want to add a validation step before someone can propose a tree for synthesis. Short term, we will almost certainly need a validation step before propinquity puts the proposed trees into synthesis.
(up until now) using the ^ot:candidateTreeForSynthesis was just a flag that a curator thought the tree was ready, and another step had to be taken to actually push the tree into the synth.
Don't we have the same problem if the UI adds each tree to a collection for synthesis?
The current behavior is easy to maintain, just by treating these trees as candidates and expecting a curator to assign them to a proper synth collection.
We do have the same problem if the UI just adds trees to a grab bag. The only difference is that the grab-bag (opentreeoflife/default collection) has just 4 trees rather than hundreds of trees.
As I say, I don't object to using ^ot:candidateTreeForSynthesis
, but I do think we might run into some headaches initially.
I agree that the synth should just filter trees that appear multiple times (retaining only the first). Currently it would not matter (for the output tree) if the same tree shows up more than one time.
As I say, I don't object to using ^ot:candidateTreeForSynthesis, but I do think we might run into some headaches initially.
Ah, because of the legacy data. Gotcha.
https://github.com/OpenTreeOfLife/germinator/issues/82 discusses some errors that cropped up in the context of multiple curators trying to add a study to the default synth collection. I originally thought that these were indicative a race condition, but that was an incorrect diagnosis.
The phylesystem-api for studies assumed that it would be pretty rare for multiple curators to be working on the same study at the same time, so we could tolerate the second curator's work being stranded on a work-in-progress branch until some manual fixes were performed on the server.
But the default synth collection (https://github.com/OpenTreeOfLife/collections-1/blob/master/collections-by-owner/opentreeoflife/default.json) is a single resource that multipe curators may be interested in. Particularly at workshops, when we are demonstrating to a group how to add a tree (and several people are playing along on their own laptops).
We also observed that users frequently want to just add a tree to the queue, and don't want to worry about the ranking.
That argues for an "add this tree" button. And the cleanest way to support that (and minimize chance of a change being stuck on a work in progress branch) seems like it would be a service in the API that:
There would be no chance for the addition to end up on a work-in-progress branch in this workflow.