Simpler process for including / excluding trees and studies in/from synthesis

jar398 commented 8 years ago

If we only have a checkbox for including individual trees in synthesis, then there is no way to tell whether no checked boxes means the curator hasn't reviewed the trees for inclusion, vs. whether they have reviewed them and decided none should be in synthesis.

This distinction is very important because it determines whether there exists a curation "debt" - whether review is necessary. If it's not necessary, we save a lot of time by avoiding re-review. If it is necessary, we have a chance to include some tree in synthesis that otherwise would have been excluded merely because the initial curator didn't have time or expertise to review.

Many studies are simply dumped into phylesystem without selection of tree(s) for synthesis, so this situation is pretty common.

I thought this is what the study-wide flag was intended for, but it seems there is consensus to remove this flag, leaving nothing in its place.

If we have a flag like this, it should probably be a checkbox on the tree-list page where the tree-specific checkboxes are, not on the metadata page. The flag is only meaningful if no trees are selected for synthesis (if at least one tree is selected then the review in question has already happened and the "debt" has been paid off).

Discussion here: https://gitter.im/OpenTreeOfLife/public?at=57bf41eecc94ce8b5663c369

[comment updated with small clarification]

jimallman commented 8 years ago

It seems we have several overlapping use cases for these flags, and no clear consensus on how they should be interpreted by (a) the tools used to construct a synthetic tree, or (b) curators and other users.

I had thought the study-level flag to be a message from the initial curator to discourage synthesis and further curation of the studies in this tree. The tree-level flag would signal that (in at least one curator's opinion) a tree is "synthesis-worthy" as is. As @jar398 points out, we should take advantage of this kind of signaling among curators as a community. Maybe we need hobo signs that can be left to warn (or spur further efforts from) other curators and the synthesis team:

No usable trees in this study (don't bother).
This tree looks good (someone should work on it).
This tree is ready for synthesis!
This tree has real problems (avoid)!

I had hoped that our rich-text curator notes (under Metadata tab) would serve this purpose, but maybe we need something more structured..?

In terms of workflow, keep in mind that the current UI assumes any logged-in user can reverse these flags. It recognizes no boss or owner of a study. (We have the ability to recognize the study's original submitter/curator, but it would be unusual to give them special treatment in the curation app.)

jimallman commented 8 years ago

See some related comments on this PR: https://github.com/OpenTreeOfLife/opentree/pull/1032

kcranston commented 8 years ago

Take a look at the proposed 'Synthesis' field on the New study homepage mockup. Various options to indicate whether a tree has been reviewed, proposed for inclusion in synthesis, included in synthesis, etc.

jimallman commented 8 years ago

Thanks for the reminder, @kcranston! Yes, I think this status field would suffice for simple "hinting" among study curators and the synthesis team. The curator notes (in Metadata) could be used to expand on these, e.g. to explain why a given tree is marked "Do not use."

The values shown in the mockup imply a simple "workflow" through these states:

None (ie, nobody has evaluated this tree)
Do not use (see curator notes for full explanation?)
Include (ie, this tree is ready for use in synthesis?)
In synthesis (it's already being used)

Does Do not use mean that a tree is generally unsuitable (not worth further curation effort), or not-yet-ready but a possible candidate?

We might also want to add a state like Needs work, to indicate that a tree looks like a good candidate for synthesis (vs. Do not use), but it will need significant curation effort.

I assume we'd keep this "workflow" as simple as possible: The original submitter (curator) has no special privileges here, nor does the synthesis team. Any curator can choose any value, at any time, with the possible exception of In synthesis. Hmmm.

Would In synthesis (vs. Include) be triggered automatically? We have the data to support that; it seems reasonable that in this case, the field would be read-only, unless you want a curator to be able to override (or question) its inclusion.

kcranston commented 8 years ago

Good questions!

In order to figure out the In synthesis bit, we need to plan how that choice interacts with collections. Thoughts:

selecting Include adds the tree to the default collection (with some helpful tips about ranking and other collections). Do we throw an error / warning if the necessary curation steps are not complete? Maybe that option is greyed out until all necessary curation completed.
the status automatically updates to Included after the next synthesis run (not sure where in the pipeline this happens, but otindex can probably provide that info)
if a curator tries to change the status of an In synthesis tree, do we remove it from the collection? Do we need another status for "in synthesis now, but flagged for removal"

Agreed that any curator can change status of trees.

jimallman commented 8 years ago

we need to plan how that choice interacts with collections

I'd really like to revisit the idea of building the synthesis "collection" on-the-fly, based on flags like these and a sanity check at the time of compilation. But that would require ranking (synth order) to be automatic as well. I thought that in a prior discussion, someone said that would be OK...

the status automatically updates to Included after the next synthesis run (not sure where in the pipeline this happens, but otindex can probably provide that info)

We already know when a tree is in the current synthesis, as this appears in the curation Metadata tab:

if a curator tries to change the status of an In synthesis tree, do we remove it from the collection? Do we need another status for "in synthesis now, but flagged for removal"

Hmm. It's hard not to separate the indicator of status vs. current synthesis, versus the curator's preference about whether it belongs there. Maybe this should be two separate widgets?

jar398 commented 8 years ago

You could put the in-synthesis status indicator next to the want-in-synthesis intent selector. The two are orthogonal, so if one has M states and the other has N states, all MN possibilities make sense.

The mockup says 'None' which seems obscure to me. Maybe 'Unreviewed' for this state.

jimallman commented 8 years ago

Agreed on both counts. If this makes sense to @kcranston, I can mock something up.

kcranston commented 8 years ago

This sounds good to me - two separate fields, one for whether or not the tree is in the current synthetic tree and one for whether a curator has assessed whether the tree should be in synthesis.

This will also necessitate changes to the nexson and the API. Currently, we have two study-level properties: ot:notIntendedForSynthesis (boolean) and ot:candidateTreeForSynthesis (array of tree ids). New proposal (feedback welcome):

remove both study-level properties
add one new tree property, ot:synthesisStatus with a set of possible values (include, rejected, unreviewed)
the information about whether or not a tree is in synthesis comes from another source, not stored in the nexson (i.e. no change from the current state, except that we show it in the curator on a tree-by-tree basis rather than for a whole study)

jimallman commented 8 years ago

two separate fields, one for whether or not the tree is in the current synthetic tree and one for whether a curator has assessed whether the tree should be in synthesis

Hm, will we be able to distinguish a tree that was not submitted into synthesis, and one that was submitted but ultimately didn't appear in (or have any effect on) the resulting synthetic tree? Would the latter "moot" tree appear in the list of bibliographic references? If we have any examples I'll be glad to chase this down.

jimallman commented 8 years ago

Here's an attempt to communicate all states for a tree vs. current synthesis, as well as "votes" (up or down) and comments by multiple curators.

On mouse-over, the status would be explained in more detail, basically

Included means it's incorporated into the latest (public) synthetic tree. Don't know if this should distinguish "moot" trees with no visible effect...
Listed means it's been accepted for use in new synthesis, but the results aren't yet public (visible in the webapp).
Delisted means the tree is visible in public synth, but will be removed next time.
Not included is hopefully clear enough.

Clicking the "upvote" or "downvote" information would pop up a list of which curators have voted, and perhaps a one-line comment from each. If you're a logged-in curator, you'll have the option of adding, removing, or changing your vote for this tree.

If we want a more expansive UI for voting or reviewing others' votes, we could jump to the tree popup and its Properties tab. We should probably repeat this UI there in any case, so that someone working directly on the tree can add or update their vote easily.

Votes and comments could be stored in Nexson using new per-tree properties, or as part of a "favorites" feature discussed previously.

kcranston commented 8 years ago

Thanks for the update! The up / down votes are a nice touch - how does this get stored in the nexson?

Also, I thought that we were going to have two separate fields:

one that communicated whether or not the tree was in the current synthesis version (boolean, not editable by curator)
one for status (multiple values, editable by curator)

jimallman commented 8 years ago

I thought that we were going to have two separate fields

The status badge here (Include, Listed, etc) reflects both the tree's current state vs. synthesis (whether it's vislble in the latest public synthetic tree) and our current intent for this tree (to be included or not).

The mockup suggests a different way of triggering inclusion. Instead of a single verdict from any curator, we'd tally up- and down-votes to make the call (perhaps by simple majority, or giving downvotes more weight)? Depending on how this is implemented, tipping the scale in either direction might be enough to immediately set the status to Listed or Delisted. This is an attempt to capture the deliberation around a tree, without introducing a full-blown discussion tool.

jimallman commented 8 years ago

how does this get stored in the nexson?

I touched a bit on this above. The most straightforward approach would be to store a per-tree property with all votes on this tree, something like this (reflects the display above for tree 'Sad panda tree'):

'^ot:preferred_for_synthesis' : [
    {
        'voter': { 'name': 'Jim Allman', 'userid': 'jimallman'},
        'voting_to_include': true,
        'date': '2016-07-05',
        'version': '123456789abc',  // SHA of study at this time?
        'comment': 'This looks ready to me!'
    },
    {
        'voter': { 'name': 'Joe Example', 'userid': 'exampleuser'},
        'voting_to_include': true,
        'date': '2016-07-02',
        'version': '113456789abc',
        'comment': 'This is a high-value tree.'
    },
    {
        'voter': { 'name': 'Karen Cranston', 'userid': 'kcranston'},
        'voting_to_include': false,
        'date': '2016-06-01',
        'version': '003456789abc',
        'comment': 'Needs more OTUs mapped for good results.'
    }
]

This would also work well as part of the "favorites" feature discussed previously. In that case, each curator has a favorites file that points to clades of interest, favorite studies, etc. The information would be more scattered (vs. stored in the study), so we'd need otindex to keep it all handy for fast tallies. In this scenario, my vote might appear like this in my personal favorites document favorites-1/jimallman.json:

...
'trees_for_synthesis': [
    ...
    {
        'study_id': 'pg_2606',
        'tree_id': 'tree_6050',
        'voting_to_include': true,
        'date': '2016-07-05',
        'version': '123456789abc',  // SHA of study at this time?
        'comment': 'This looks ready to me!'
    }
    ...
]
...

jimallman commented 8 years ago

Apologies, I jumped the gun a bit with my voting notions above. Here's a mockup that's closer in spirit to our recent discussion:

Alternate phrasing for some of these:

Do not use might be Not suitable or Not suitable for synthesis
Needs work might be Needs additional curation
Include this might be Recommended for inclusion or Recommended for synthesis

I'm not as happy with this UI, mainly because it treats "intent" (whether or not to include a tree in synthesis, and why) as a unilateral decision by an unknown curator. One which anyone else can override for any reason, with no explanation beyond digging through the History tab (possibly git commits) and the curator notes (in Metadata).

jar398 commented 8 years ago

I see, so you select one of four intents, and then having done so, you see one of eight 'badges' - for each intent, the two states have intent-specific names. Yes? So the user who wanted to know the intent and state would have to deconvolve the two dimensions from the single badge value.

I was thinking that the state would have the same name regardless of the choice of intent, so that you would see two 'badges', one of four for intent and one of two for state. Then you wouldn't have to deconvolve. If state were an icon or checkmark, it wouldn't take up much horizontal space.

kcranston commented 8 years ago

This mockup is what I was thinking. To judge the most recent mockup from @jimallman , I would need more information about the potential states of the pulldown and the labels, and how the label gets generated.

I'll point out that whether or not a tree gets included in synthesis is currently up to a unilateral decision by a single curator - i.e. whether or not the tree is in a collection being used for synthesis.

jimallman commented 8 years ago

I'll point out that whether or not a tree gets included in synthesis is currently up to a unilateral decision by a single curator - i.e. whether or not the tree is in a collection being used for synthesis.

Good point, and it keeps things fairly transparent, including the ranking of trees (versus some algorithmic black box). Are these revisions moving in the wrong direction?

jimallman commented 8 years ago

I see, so you select one of four intents, and then having done so, you see one of eight 'badges' - for each intent, the two states have intent-specific names. Yes?

Yes, the indicator ("badge") reflects both what you'll see in the current synthetic tree, and what to expect going forward. E.g., The indicator says 'Delisted', so this tree is reflected in the current synthetic tree, but will soon disappear because someone has deemed it 'Not suitable'.

Less obvious (?) is that you (as a logged-in curator) can change the intent and the indicator will update accordingly, from 'Delisted' to 'In synthesis'.

So the user who wanted to know the intent and state would have to deconvolve the two dimensions from the single badge value.

Hm, the intent is plainly stated in the latest mockups, but you're correct if our "intent" is based on voting rather than explicit. My intention is that they'd need to think less about the inner workings of the system, not more.

jimallman commented 8 years ago

To judge the most recent mockup from @jimallman , I would need more information about the potential states of the pulldown and the labels, and how the label gets generated.

The label (status indicator) would be generated based on the underlying values, as follows:

Tree in current synthesis?	Current "intent" for this tree	Resulting status indicator
No	Not Reviewed	Not included (initial state)
No	Do not use	Not included
No	Needs work	Not included
No	Include this	Listed
Yes	Not Reviewed	Delisted (this is unlikely)
Yes	Do not use	Delisted
Yes	Needs work	Delisted
Yes	Include this	Included

kcranston commented 8 years ago

jimallman commented 8 years ago

A few more mockups, based on the recent conversation and workflow.

First, an attempt at simplicity by offering a single action (hyperlink) that adds or removes the tree from the next synthesis. Note that this would require a popup or other means to gather a free-text reason. Also, note that the third tree in the list is not in the current synthetic tree, but has been queued by another curator; the result is pretty confusing.:

This time I tried a pair of sticky buttons, a toggle that shows the current state (disabled) and just one available action. Again, this would be followed by a prompt for a free-text reason. Once again, the third tree (see above) looks pretty confusing. I've added "?" icons that would give some background on click or mouse-over, e.g.

Removed by @kcranston: "This tree is not suitable for synthesis."

This version attempts to implement the latest workflow by using more explicit text, and by offering common reasons in the available actions:

None of these are particularly strong in my opinion. Maybe if we return to a separate 'Y' / 'N' column for the tree in current synthesis...

kcranston commented 8 years ago

None of these are particularly strong in my opinion. Maybe if we return to a separate 'Y' / 'N' column for the tree in current synthesis...

Yes, I think we should clearly separate the fact about whether a tree is in the current version of the synthetic tree from the action of the curator. When combining them in the same column, it makes it seem that the curator change change that label (particularly because the proposed styling matches the OTU mapping, which is something that the curator CAN change).

jimallman commented 8 years ago

Agreed. This brings us full circle, to something very close to the early mockup, with only minor tweaks to the column headings and options text:

This seems like the least-worst option in terms of clarity and expressiveness. In particular, the third tree (not currently in synthesis, but considered ready) is pretty easy to understand here, and it doesn't over-promise.

kcranston commented 8 years ago

+1

snacktavish commented 8 years ago

+1 also!

kcranston commented 8 years ago

Changed the title, as we are talking more about inclusion / exclusion of trees here as we are whole studies (noting the original concern about study-level flags).

kcranston commented 8 years ago

Adding link to epic gitter conversation about this issue and related PR.

jimallman commented 7 years ago

I believe this issue is completely addressed in the latest master (after merging #1100).

OpenTreeOfLife / opentree

Simpler process for including / excluding trees and studies in/from synthesis #1042