Closed hdliv closed 8 years ago
@hdliv, what do you mean when you say "I found them there [in OTT]"? I tested the first three strain names in the taxon search in the header of the synth-tree explorer:
In each case, the binomial (species) name was found, but none of the strains.
In any case, one of my next tasks (see issue #320) is to change the TNRS code used for OTU mapping. Perhaps this will help with strain mapping. I'll let you know as soon as those changes are in place.
Indeed.
377323 | 5260452 | Chlorobium luteolum DSM 273 | no rank - terminal | ncbi:319225 | | infraspecific |
Looks to me as if taxomachine is filtering out taxa that are infraspecific (have a species as an ancestor).
taxomachine/src/main/java/org/opentree/taxonomy/OTTFlag.java has:
INFRASPECIFIC ("infraspecific", true),
I think we should change this. Wondering what the purpose of that decision was, though, the answer to which would bear on the fix - should the change be made in the taxonomy, by introducing a new flag or special processing for non-eukaryotes (?? but we have strains everywhere in the tree); or just include all infraspecifics, by changing the filtering rule?
Jonathan
On Fri, Jul 18, 2014 at 4:36 PM, Dail Laughinghouse < notifications@github.com> wrote:
added a study from treebase (ot_95) and the first 5-6 strains I checked that were not mapping are in OTT...at least I found them there but they are not mapping when I try to map them through curator. On phylografter there was a scroll down bar so that I could find the correct strain. As Curator doesn't have that...how can we can the correct strain we need for each strain when this happens??
[image: screen shot 2014-07-18 at 4 32 53 pm] https://cloud.githubusercontent.com/assets/6361361/3631806/2b080716-0ebb-11e4-832d-59fdfab6df8d.png
— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/opentree/issues/374.
I do find the taxa in OTT (see attached)
There are strains in Bacteria, Archaea, and Eukarya that all need to be mapped and we have been having a harder time mapping them now then before.
If the strains are infraspecific to species...in the way that the taxonomy is built...we have to get to these to be able to map them. Thanks
I understand. And I hear you that there are strains in Eukarya. In my previous message I agreed with you - the strains are in OTT and in synthesis but not in the TNRS - and said that as far as a fix we have at least two design options. This is not a decision I want to make unilaterally so I'm requesting input from Cody, who's the one who set the original infraspecific taxon policy for the TNRS and the one who ought to clear any changes to the TNRS.
On Sat, Jul 19, 2014 at 1:40 AM, Dail Laughinghouse < notifications@github.com> wrote:
I do find the taxa in OTT (see attached) [image: screen shot 2014-07-19 at 1 39 10 am] https://cloud.githubusercontent.com/assets/6361361/3633940/228b6302-0f07-11e4-921d-889ddc44aa3c.png
There are strains in Bacteria, Archaea, and Eukarya that all need to be mapped and we have been having a harder time mapping them now then before.
— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/opentree/issues/374#issuecomment-49500329 .
This study also won't map.
It is the same problem with taxomachine that is causing problems with both of these studies. Issue here:
https://github.com/OpenTreeOfLife/taxomachine/issues/54
We will discuss this on the software call today.
This does not seem to be a taxomachine issue. Taxomachine does not apply special filters for strains or infraspecific taxa.
curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/json" -d '{"names":["Actinomyces sp. oral strain B19SC"]}'
curl -X POST http://devapi.opentreeoflife.org/taxomachine/v1/contextQueryForNames -H "content-type:application/json" -d '{"names":["Aegla laevis laevis"]}'
Ok, so what is going on here, then? If the strains are in OTT and taxomachine is not filtering them out, why are they not showing up in the curator app? @jimallman is something strange happening on the curator side?
another one...
what time is the software call because i was looking at the other issue and it seemed as if you have removed the filter but the strains of ot_95 still aren't mapping. Did you try this out during the call and it work?
Here's the raw data from taxonomy:
$ grep "Chlorobium luteolum DSM 273" tax/2.6/taxonomy.tsv 377323 | 5260452 | Chlorobium luteolum DSM 273 | no rank | ncbi:319225 | | | $ grep "Chlorobium luteolum DSM 273" tax/2.8/taxonomy.tsv 377323 | 5260452 | Chlorobium luteolum DSM 273 | no rank - terminal | ncbi:319225 | | infraspecific |
The strain is being lost somewhere between taxonomy and curator (or taxon completion box in browser). Most likely culprit is the TNRS but having looked at the source code, I don't see how this can be the case. Next step in debugging is to isolate a TNRS call using curl that should succeed but doesn't (or does). API documentation here: https://github.com/OpenTreeOfLife/opentree/wiki/Open-Tree-of-Life-APIs
Update: As I suspected, the problem is that I'm still using the API method autocompleteBoxQuery
for both taxon search and OTU mapping. This doesn't return infraspecies taxa for some reason, but the newer contextQueryForNames
method behaves as expected, so I'll be switching to this very soon.
So until this is switched we can't map strains? How long so we know when we can add new studies..especially for the next synthesis.
Hi Dail, I'm working on the code now and will post the results on devtree tonight for review. It's a little fussy about returning matches from partial names, so we might need one more pass from Cody to get the desired behavior.
How long so we know when we can add new studies..especially for the next synthesis.
I'm hoping maybe end of this week..?
@hdliv, the new OTU mapping is available now on devtree (so only for test data!) if you want to try it and give feedback
@jimallman, much better now!! :)
I will interpret Jim's tone and Dail's "much better now" as permission to close the issue.
added a study from treebase (ot_95) and the first 5-6 strains I checked that were not mapping are in OTT...at least I found them there but they are not mapping when I try to map them through curator. On phylografter there was a scroll down bar so that I could find the correct strain. As Curator doesn't have that...how can we can the correct strain we need for each strain when this happens??