OpenTreeOfLife / opentree

Opentree browsing and curation web site. For overarching or cross-repo concerns, please see the 'germinator' repo.
http://tree.opentreeoflife.org/
BSD 2-Clause "Simplified" License
111 stars 26 forks source link

strains not mapping in Curator, though in OTT #374

Closed hdliv closed 8 years ago

hdliv commented 10 years ago

added a study from treebase (ot_95) and the first 5-6 strains I checked that were not mapping are in OTT...at least I found them there but they are not mapping when I try to map them through curator. On phylografter there was a scroll down bar so that I could find the correct strain. As Curator doesn't have that...how can we can the correct strain we need for each strain when this happens??

screen shot 2014-07-18 at 4 32 53 pm

jimallman commented 10 years ago

@hdliv, what do you mean when you say "I found them there [in OTT]"? I tested the first three strain names in the taxon search in the header of the synth-tree explorer:

In each case, the binomial (species) name was found, but none of the strains.

In any case, one of my next tasks (see issue #320) is to change the TNRS code used for OTU mapping. Perhaps this will help with strain mapping. I'll let you know as soon as those changes are in place.

jar398 commented 10 years ago

Indeed.

377323 | 5260452 | Chlorobium luteolum DSM 273 | no rank - terminal | ncbi:319225 | | infraspecific |

Looks to me as if taxomachine is filtering out taxa that are infraspecific (have a species as an ancestor).

taxomachine/src/main/java/org/opentree/taxonomy/OTTFlag.java has:

INFRASPECIFIC ("infraspecific", true),

I think we should change this. Wondering what the purpose of that decision was, though, the answer to which would bear on the fix - should the change be made in the taxonomy, by introducing a new flag or special processing for non-eukaryotes (?? but we have strains everywhere in the tree); or just include all infraspecifics, by changing the filtering rule?

Jonathan

On Fri, Jul 18, 2014 at 4:36 PM, Dail Laughinghouse < notifications@github.com> wrote:

added a study from treebase (ot_95) and the first 5-6 strains I checked that were not mapping are in OTT...at least I found them there but they are not mapping when I try to map them through curator. On phylografter there was a scroll down bar so that I could find the correct strain. As Curator doesn't have that...how can we can the correct strain we need for each strain when this happens??

[image: screen shot 2014-07-18 at 4 32 53 pm] https://cloud.githubusercontent.com/assets/6361361/3631806/2b080716-0ebb-11e4-832d-59fdfab6df8d.png

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/opentree/issues/374.

hdliv commented 10 years ago

I do find the taxa in OTT (see attached) screen shot 2014-07-19 at 1 39 10 am

There are strains in Bacteria, Archaea, and Eukarya that all need to be mapped and we have been having a harder time mapping them now then before.

If the strains are infraspecific to species...in the way that the taxonomy is built...we have to get to these to be able to map them. Thanks

jar398 commented 10 years ago

I understand. And I hear you that there are strains in Eukarya. In my previous message I agreed with you - the strains are in OTT and in synthesis but not in the TNRS - and said that as far as a fix we have at least two design options. This is not a decision I want to make unilaterally so I'm requesting input from Cody, who's the one who set the original infraspecific taxon policy for the TNRS and the one who ought to clear any changes to the TNRS.

On Sat, Jul 19, 2014 at 1:40 AM, Dail Laughinghouse < notifications@github.com> wrote:

I do find the taxa in OTT (see attached) [image: screen shot 2014-07-19 at 1 39 10 am] https://cloud.githubusercontent.com/assets/6361361/3633940/228b6302-0f07-11e4-921d-889ddc44aa3c.png

There are strains in Bacteria, Archaea, and Eukarya that all need to be mapped and we have been having a harder time mapping them now then before.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/opentree/issues/374#issuecomment-49500329 .

hdliv commented 10 years ago

This study also won't map.

screen shot 2014-07-21 at 10 59 52 am

kcranston commented 10 years ago

It is the same problem with taxomachine that is causing problems with both of these studies. Issue here:

https://github.com/OpenTreeOfLife/taxomachine/issues/54

We will discuss this on the software call today.

chinchliff commented 10 years ago

This does not seem to be a taxomachine issue. Taxomachine does not apply special filters for strains or infraspecific taxa.

strain result

curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/json" -d '{"names":["Actinomyces sp. oral strain B19SC"]}'

infraspecific result

curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/json" -d '{"names":["Aegla laevis laevis"]}'
kcranston commented 10 years ago

Ok, so what is going on here, then? If the strains are in OTT and taxomachine is not filtering them out, why are they not showing up in the curator app? @jimallman is something strange happening on the curator side?

hdliv commented 10 years ago

another one...

screen shot 2014-07-21 at 2 51 36 pm

hdliv commented 10 years ago

what time is the software call because i was looking at the other issue and it seemed as if you have removed the filter but the strains of ot_95 still aren't mapping. Did you try this out during the call and it work?

jar398 commented 10 years ago

Here's the raw data from taxonomy:

$ grep "Chlorobium luteolum DSM 273" tax/2.6/taxonomy.tsv 377323 | 5260452 | Chlorobium luteolum DSM 273 | no rank | ncbi:319225 | | | $ grep "Chlorobium luteolum DSM 273" tax/2.8/taxonomy.tsv 377323 | 5260452 | Chlorobium luteolum DSM 273 | no rank - terminal | ncbi:319225 | | infraspecific |

The strain is being lost somewhere between taxonomy and curator (or taxon completion box in browser). Most likely culprit is the TNRS but having looked at the source code, I don't see how this can be the case. Next step in debugging is to isolate a TNRS call using curl that should succeed but doesn't (or does). API documentation here: https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-of-Life-APIs

jimallman commented 10 years ago

Update: As I suspected, the problem is that I'm still using the API method autocompleteBoxQuery for both taxon search and OTU mapping. This doesn't return infraspecies taxa for some reason, but the newer contextQueryForNames method behaves as expected, so I'll be switching to this very soon.

hdliv commented 10 years ago

So until this is switched we can't map strains? How long so we know when we can add new studies..especially for the next synthesis.

jimallman commented 10 years ago

Hi Dail, I'm working on the code now and will post the results on devtree tonight for review. It's a little fussy about returning matches from partial names, so we might need one more pass from Cody to get the desired behavior.

jimallman commented 10 years ago

How long so we know when we can add new studies..especially for the next synthesis.

I'm hoping maybe end of this week..?

jimallman commented 10 years ago

@hdliv, the new OTU mapping is available now on devtree (so only for test data!) if you want to try it and give feedback

hdliv commented 10 years ago

@jimallman, much better now!! :)

jar398 commented 8 years ago

I will interpret Jim's tone and Dail's "much better now" as permission to close the issue.