PHI-base / canto-docs

User documentation for the PHI-Canto project
MIT License
0 stars 0 forks source link

Clarify how infective ability terms should be annotated #22

Open jseager7 opened 2 years ago

jseager7 commented 2 years ago

(Originally in #21)

Due to an oversight in PHI-Canto, it's currently possible to select some infective ability terms as primary annotation terms. Even after this problem is fixed, it would help if we provided clear guidance about how these phenotypes are meant to be annotated. The first instinct of new curators could be to regard the infective ability phenotypes as the primary phenotype, and not consider our approach of distinguishing between the 'observable' phenotype and the second-order descriptions of the interaction outcome.

This might be important enough to warrant help text in the application itself, because curators currently aren't forced to enter annotation extensions, so it's possible that if they're blocked from annotating infective ability as a primary phenotype, then they won't annotate it at all – it's pretty easy to miss annotation extensions if you're not using the step-based workflow.

CuzickA commented 2 years ago

Both of our new assistant biocurators have encountered this problem. Many publications refer to 'reduced virulence' etc and a curator naturally wants to make these annotations using this language. However, it is not clear that these 'conclusion' terms should not be used as the primary PHIPO annotation (which refers to the observed phenotype).

jseager7 commented 2 years ago

It's also not easy to forbid the use of the terms for primary annotation, because Canto ignores all restrictions on terms when they're entered using the term ID (even the 'do not annotate' subsets). I can see this being a likely outcome because of the following process:

  1. The curator cannot find a term for 'reduced virulence', or similar;
  2. They decide to look at the PHIPO ontology using an ontology browser;
  3. They find the 'reduced virulence' term in the ontology browser;
  4. They enter the term ID in the search field instead, and select the term.

It's either that, or they're going to suggest the term, or email us asking why the term isn't able to be selected.

Maybe the simplest reliable solution would be to add help text in the PHI-Canto user interface, both in the step-based workflow and the Quick Add workflow, reminding curators that they should not annotate changes in pathogenicity and virulence as primary terms.

However, if most curators are going to think of the primary phenotype as 'reduced virulence', is it really sensible for us to insist otherwise? It's not nice to have a process that does the opposite of what you'd expect. Maybe we've got the process backwards here, in that the 'reduced virulence' term is a more natural fit for the 'primary' phenotype, and the observed phenotype is the extension? It's not easy to change that now because of all of the existing curation, but I worry that this process is going to make curation unintuitive for every new curator.

I think the root of the problem is that PomBase always regarded annotation extensions as truly optional 'extensions', so the user interface wasn't designed to make extensions obvious (or mandatory). In our case, extensions are much more important, often on par with the primary annotation in terms of relevance to database users.

I can think of plenty of technical solutions (code changes) to make the annotation extensions more obvious, but none of them are going to be easy.

CuzickA commented 2 years ago

Perhaps we could add some text to the initial email sent out once a paper has been entered into PHI-Canto for curation.

We could have a small section briefly mentioning the PHI4 high level terms and the 3 methods that they are now curated with in PHI-Canto for PHI5.