OpenTreeOfLife / taxomachine

taxonomy graphdb
Other
7 stars 4 forks source link

Need flag to return suppressed taxa in TNRS results #40

Closed chinchliff closed 10 years ago

chinchliff commented 10 years ago

I've added this to the autocompleteQuery (which is equivalent to and replacing autocompleteBoxQuery, although that service name/url will be conserved for now for compatibility). The feature should be stable in commit fe5bd1a9a3078ee9302e2765e9271f761cb45c08.

The feature is dependent on new indexes, so new taxomachine dbs will need to built in order for it to return results.

Here is an example (currently useless unless you rebuild taxomachine db, so just an example using localhost). This should work for public domains once the dbs and neo4j plugins have been updated there:

curl -X POST http://localhost:7474/db/data/ext/TNRS/graphdb/autocompleteQuery -H "content-type:application/json" -d '{"queryString":"Plasmid ColA","includeDubious":true}'

jar398 commented 10 years ago

Note: the notion of 'hidden' or 'suppressed' used in taxomachine is different from that used in treemachine. Taxomachine has been allowing matches to incertae sedis taxa (just not to their dummy containers). So those are not hidden according to taxomachine. But treemachine does not use incertae sedis taxa in synthesis. So they are hidden according to treemachine.

This caused me a lot of confusion. I had to look at the source code for both programs to figure out what was going on.

Smasher also has a notion of hidden taxa which it uses in generating statistical summaries only. I believe it matches treemachine's.

On Wed, Jun 4, 2014 at 3:03 PM, Cody Hinchliff notifications@github.com wrote:

I've added this to the autocompleteQuery (which is equivalent to and replacing autocompleteBoxQuery, although that service name/url will be conserved for now for compatibility) on the branch https://github.com/OpenTreeOfLife/taxomachine/tree/new_features.

The feature is dependent on new indexes, so new taxomachine dbs will need to built in order for it to return results.

Here is an example (currently useless unless you rebuild taxomachine db, so just an example using localhost). This should work for public domains once the dbs and neo4j plugins have been updated there:

curl -X POST http://localhost:7474/db/data/ext/TNRS/graphdb/autocompleteQuery -H "content-type:application/json" -d '{"queryString":"Plasmid ColA","includeDubious":true}'

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/40#issuecomment-45136717 .

chinchliff commented 10 years ago

That would mean that so-called "lost children" of incertae sedis taxa would not be showing up in synthesis? That doesn't seem like what we want to be doing...

On Wed, Jun 4, 2014 at 4:03 PM, Jonathan A Rees notifications@github.com wrote:

Note: the notion of 'hidden' or 'suppressed' used in taxomachine is different from that used in treemachine. Taxomachine has been allowing matches to incertae sedis taxa (just not to their dummy containers). So those are not hidden according to taxomachine. But treemachine does not use incertae sedis taxa in synthesis. So they are hidden according to treemachine.

This caused me a lot of confusion. I had to look at the source code for both programs to figure out what was going on.

Smasher also has a notion of hidden taxa which it uses in generating statistical summaries only. I believe it matches treemachine's.

On Wed, Jun 4, 2014 at 3:03 PM, Cody Hinchliff notifications@github.com wrote:

I've added this to the autocompleteQuery (which is equivalent to and replacing autocompleteBoxQuery, although that service name/url will be conserved for now for compatibility) on the branch https://github.com/OpenTreeOfLife/taxomachine/tree/new_features.

The feature is dependent on new indexes, so new taxomachine dbs will need to built in order for it to return results.

Here is an example (currently useless unless you rebuild taxomachine db, so just an example using localhost). This should work for public domains once the dbs and neo4j plugins have been updated there:

curl -X POST http://localhost:7474/db/data/ext/TNRS/graphdb/autocompleteQuery -H "content-type:application/json" -d '{"queryString":"Plasmid ColA","includeDubious":true}'

— Reply to this email directly or view it on GitHub < https://github.com/OpenTreeOfLife/taxomachine/issues/40#issuecomment-45136717>

.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/40#issuecomment-45143859 .

jar398 commented 10 years ago

Right. Not my department. Talk to Stephen. I think the problem is that studies will place a lot of things in their proper place, and this will cause tons of taxa to become paraphyletic (they will "sink"). Treemachine's assumption is that sibling taxa are disjoint, and incertae sedis taxa breaks that. This problem could be fixed with smarter representations and algorithms - I would like to do this with smasher as well which has a very similar problem.

On Wed, Jun 4, 2014 at 4:09 PM, Cody Hinchliff notifications@github.com wrote:

That would mean that so-called "lost children" of incertae sedis taxa would not be showing up in synthesis? That doesn't seem like what we want to be doing...

On Wed, Jun 4, 2014 at 4:03 PM, Jonathan A Rees notifications@github.com

wrote:

Note: the notion of 'hidden' or 'suppressed' used in taxomachine is different from that used in treemachine. Taxomachine has been allowing matches to incertae sedis taxa (just not to their dummy containers). So those are not hidden according to taxomachine. But treemachine does not use incertae sedis taxa in synthesis. So they are hidden according to treemachine.

This caused me a lot of confusion. I had to look at the source code for both programs to figure out what was going on.

Smasher also has a notion of hidden taxa which it uses in generating statistical summaries only. I believe it matches treemachine's.

On Wed, Jun 4, 2014 at 3:03 PM, Cody Hinchliff notifications@github.com

wrote:

I've added this to the autocompleteQuery (which is equivalent to and replacing autocompleteBoxQuery, although that service name/url will be conserved for now for compatibility) on the branch https://github.com/OpenTreeOfLife/taxomachine/tree/new_features.

The feature is dependent on new indexes, so new taxomachine dbs will need to built in order for it to return results.

Here is an example (currently useless unless you rebuild taxomachine db, so just an example using localhost). This should work for public domains once the dbs and neo4j plugins have been updated there:

curl -X POST http://localhost:7474/db/data/ext/TNRS/graphdb/autocompleteQuery -H "content-type:application/json" -d '{"queryString":"Plasmid ColA","includeDubious":true}'

— Reply to this email directly or view it on GitHub <

https://github.com/OpenTreeOfLife/taxomachine/issues/40#issuecomment-45136717>

.

— Reply to this email directly or view it on GitHub < https://github.com/OpenTreeOfLife/taxomachine/issues/40#issuecomment-45143859>

.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/40#issuecomment-45144617 .

chinchliff commented 10 years ago

Ok. Well this is done. Commit 1b125e0cec909f673072cc96d2d0bd02bb552b83

Example curl call for contextQueryForNames (see another example above for autocompleteQuery). The results in this case include hidden as well as deprecated taxa.

curl -X POST http://localhost:7474/db/data/ext/TNRS/graphdb/contextQueryForNames -H "content-type:application/json" -d '{"queryString":"Metopidae","includeDubious":true,"includeDeprecated":true}'

jar398 commented 10 years ago

So does this mean I need to prepare a cumulative list of all deprecated ids, and not just the ones from the previous version of OTT?... (which is what the 'deprecated.tsv' file is in the taxonomy dump.)

I was going to do that anyhow, just didn't know it was needed now.

Why should the TNRS tell the user about deprecated ids, anyhow? The hidden ones are real as far as we know, but there are reasons that ids get deprecated: ambiguity, synonymizations, and false alignments being the big ones. That is, it has been discovered that we don't know after all what taxon the id refers to. So what meaning is it going to have to anyone?

On Fri, Jun 13, 2014 at 10:13 AM, Cody Hinchliff notifications@github.com wrote:

Ok. Well this is done. Commit 1b125e0 https://github.com/OpenTreeOfLife/taxomachine/commit/1b125e0cec909f673072cc96d2d0bd02bb552b83

Example curl call for contextQueryForNames (see another example above for autocompleteQuery). The results in this case include hidden as well as deprecated taxa.

curl -X POST http://localhost:7474/db/data/ext/TNRS/graphdb/contextQueryForNames -H "content-type:application/json" -d '{"queryString":"Metopidae","includeDubious":true,"includeDeprecated":true}'

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/40#issuecomment-46015749 .

chinchliff commented 10 years ago

There is some value in being able to verify that an id is deprecated. I don't imagine that many users will want to include deprecated ids in their searches (for reasons you mentioned above), but I don't think it is bad to allow them to search for them either.

And yes, a cumulative list is what is now needed.

On Fri, Jun 13, 2014 at 10:19 AM, Jonathan A Rees notifications@github.com wrote:

So does this mean I need to prepare a cumulative list of all deprecated ids, and not just the ones from the previous version of OTT?... (which is what the 'deprecated.tsv' file is in the taxonomy dump.)

I was going to do that anyhow, just didn't know it was needed now.

Why should the TNRS tell the user about deprecated ids, anyhow? The hidden ones are real as far as we know, but there are reasons that ids get deprecated: ambiguity, synonymizations, and false alignments being the big ones. That is, it has been discovered that we don't know after all what taxon the id refers to. So what meaning is it going to have to anyone?

On Fri, Jun 13, 2014 at 10:13 AM, Cody Hinchliff notifications@github.com

wrote:

Ok. Well this is done. Commit 1b125e0 < https://github.com/OpenTreeOfLife/taxomachine/commit/1b125e0cec909f673072cc96d2d0bd02bb552b83>

Example curl call for contextQueryForNames (see another example above for autocompleteQuery). The results in this case include hidden as well as deprecated taxa.

curl -X POST http://localhost:7474/db/data/ext/TNRS/graphdb/contextQueryForNames -H "content-type:application/json" -d

'{"queryString":"Metopidae","includeDubious":true,"includeDeprecated":true}'

— Reply to this email directly or view it on GitHub < https://github.com/OpenTreeOfLife/taxomachine/issues/40#issuecomment-46015749>

.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/taxomachine/issues/40#issuecomment-46016531 .