OpenTreeOfLife / oti

indexing service for the OpenTreeOfLife nexson repository
Other
1 stars 0 forks source link

Indexing failures in a few studies #34

Open jimallman opened 9 years ago

jimallman commented 9 years ago

Today I used the deployment script to re-index oti, and a few studies failed (details below).

Study ot_232 reports multiple records(?) found, but there's just one JSON with this id in phylesystem-1. This is very similar to a problem discussed in #26:

Indexing http://api.opentreeoflife.org/phylesystem/v1/../ study ot_232 from http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_232.json
Calling "http://127.0.0.1:7478/db/data/ext/studies/graphdb/index_studies" with data="{'urls': ['http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_232.json']}"

Indexing failed for URL "http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_232.json", study http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_232.json. Message = "More than one hit found for ot:studyId == ot_232. The database is probably corrupt."

Two others failed with no explanation:

Indexing http://api.opentreeoflife.org/phylesystem/v1/../ study ot_31 from http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_31.json
Calling "http://127.0.0.1:7478/db/data/ext/studies/graphdb/index_studies" with data="{'urls': ['http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_31.json']}"

Indexing failed for URL "http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_31.json", study http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_31.json. Message = "null"
Indexing http://api.opentreeoflife.org/phylesystem/v1/../ study ot_32 from http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_32.json
Calling "http://127.0.0.1:7478/db/data/ext/studies/graphdb/index_studies" with data="{'urls': ['http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_32.json']}"

Indexing failed for URL "http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_32.json", study http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_32.json. Message = "null"
snacktavish commented 9 years ago

Indexing is failing for some studies. For example, https://tree.opentreeoflife.org/curator/study/view/pg_2739 does not appear in the curator home, nor is findable by search. Also true for https://tree.opentreeoflife.org/curator/study/view/pg_761, the second study on the list of synthesis inputs.

mtholder commented 9 years ago

The cURL call that fails to return the studies for the list is:

curl 'https://api.opentreeoflife.org/oti/v1/findAllStudies' -H 'Origin: https://tree.opentreeoflife.org' -H 'Accept-Encoding: gzip, deflate' -H 'Accept-Language: en-US,en;q=0.8' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.81 Safari/537.36' -H 'Content-Type: application/x-www-form-urlencoded; charset=UTF-8' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Cache-Control: max-age=0' -H 'Referer: https://tree.opentreeoflife.org/curator' -H 'Connection: keep-alive' --data 'verbose=true' --compressed

(via copying the call form the chrome dev tools that is loading the curator).

kcranston commented 9 years ago

@chinchliff @blackrim Any idea what is going on here? Is this the same (or similar) problem as discussed here? Currently, not all of the studies in the synthetic tree are showing up in the curator app.

chinchliff commented 9 years ago

I am not sure what is going on. What errors are returned when indexing is attempted for those studies?

On Mon, Jun 15, 2015 at 12:15 PM Karen Cranston notifications@github.com wrote:

@chinchliff https://github.com/chinchliff @blackrim https://github.com/blackrim Any idea what is going on here? Is this the same (or similar) problem as discussed here https://github.com/OpenTreeOfLife/oti/pull/26#discussion_r17401670? Currently, not all of the studies in the synthetic tree are showing up in the curator app.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/oti/issues/34#issuecomment-112178354.

jimallman commented 9 years ago

@chinchliff , here (see below) are the three errors described above with nicer formatting. But that was back in January, so it's probably best to attempt a fresh re-indexing and monitor the output, using cURL or index_current_repo.py.


Study ot_232 reports multiple records(?) found, but there's just one JSON with this id in phylesystem-1. This is very similar to a problem discussed in #26:

Indexing http://api.opentreeoflife.org/phylesystem/v1/../ study ot_232 from 
  http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_232.json
Calling "http://127.0.0.1:7478/db/data/ext/studies/graphdb/index_studies" with 
  data="{'urls': ['http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_232.json']}"

Indexing failed for 
  URL "http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_232.json", 
  study http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_232.json. 
Message = "More than one hit found for ot:studyId == ot_232. The database is probably corrupt."

Two others failed with no explanation:

Indexing http://api.opentreeoflife.org/phylesystem/v1/../ study ot_31 from 
  http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_31.json
Calling "http://127.0.0.1:7478/db/data/ext/studies/graphdb/index_studies" with
  data="{'urls': ['http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_31.json']}"

Indexing failed for 
  URL "http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_31.json", 
  study http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_31.json. 
Message = "null"
Indexing http://api.opentreeoflife.org/phylesystem/v1/../ study ot_32 from
  http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_32.json
Calling "http://127.0.0.1:7478/db/data/ext/studies/graphdb/index_studies" with 
  data="{'urls': ['http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_32.json']}"

Indexing failed for 
  URL "http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_32.json", 
  study http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_32.json. 
Message = "null"
jar398 commented 9 years ago

This is a major problem, top priority. I'll take a look at it; any help is welcome.

chinchliff commented 9 years ago

There are multiple issues here. For study ot_232, I have no idea why it would return that error unless there really was already a study indexed with the same id. Maybe for some reason that study is getting indexed twice? Maybe one of the other studies has an internal identifier labeling it at ot_232 even though it is named something else?

The other errors can be retrieved using curl against the devapi OTI. They are below. Both ot_31 and ot_32 cause OTI to complain that there are leaf nodes which are not assigned to OTUs. Presuming that this is correct, it would seem that the solution here would be to fix the nexsons.

I am not sure why those errors are not showing up in the output of the index_nexsons script. I fixed some of the error reporting code earlier (neo4j uses the getMessage() method on the exception, if I remember correctly, so error messages are lost if they are returned any other way), but maybe I missed something, or maybe those changes haven't been propagated to api from devapi yet.


curl -X POST http://devapi.opentreeoflife.org/v2/studies/index_studies -H 'content-type:application/json' -d '{"urls": [" http://api.opentreoflife.org/phylesystem/v1/../default/v1/study/ot_31.json "]}'

{

"message" : "The node Tn295230 is identified as a leaf but has not been assigned an OTU.",

"exception" : "NexsonParseException",

"fullname" : "org.opentree.nexson.io.NexsonParseException",

"stacktrace" : [ "org.opentree.nexson.io.NexsonNode.parseNexson(NexsonNode.java:235)", "org.opentree.nexson.io.NexsonNode.(NexsonNode.java:48)", "org.opentree.nexson.io.NexsonTree.parseNexson(NexsonTree.java:204)", "org.opentree.nexson.io.NexsonTree.(NexsonTree.java:85)", "org.opentree.nexson.io.NexsonSource.parseNexson(NexsonSource.java:191)", "org.opentree.nexson.io.NexsonSource.(NexsonSource.java:52)", "org.opentree.oti.plugins.studies.readRemoteNexson(studies.java:286)", "org.opentree.oti.plugins.studies.index_studies(studies.java:189)", "java.lang.reflect.Method.invoke(Method.java:497)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)", "java.lang.reflect.Method.invoke(Method.java:497)", "org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)" ]

}

curl -X POST http://devapi.opentreeoflife.org/v2/studies/index_studies -H 'content-type:application/json' -d '{"urls": [" http://api.opentreoflife.org/phylesystem/v1/../default/v1/study/ot_32.json "]}'

{

"message" : "The node Tn295230 is identified as a leaf but has not been assigned an OTU.",

"exception" : "NexsonParseException",

"fullname" : "org.opentree.nexson.io.NexsonParseException",

"stacktrace" : [ "org.opentree.nexson.io.NexsonNode.parseNexson(NexsonNode.java:235)", "org.opentree.nexson.io.NexsonNode.(NexsonNode.java:48)", "org.opentree.nexson.io.NexsonTree.parseNexson(NexsonTree.java:204)", "org.opentree.nexson.io.NexsonTree.(NexsonTree.java:85)", "org.opentree.nexson.io.NexsonSource.parseNexson(NexsonSource.java:191)", "org.opentree.nexson.io.NexsonSource.(NexsonSource.java:52)", "org.opentree.oti.plugins.studies.readRemoteNexson(studies.java:286)", "org.opentree.oti.plugins.studies.index_studies(studies.java:189)", "java.lang.reflect.Method.invoke(Method.java:497)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)", "java.lang.reflect.Method.invoke(Method.java:497)", "org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)" ]

}

On Wed, Jul 15, 2015 at 12:09 PM Jonathan A Rees notifications@github.com wrote:

This is a major problem, top priority. I'll take a look at it; any help is welcome.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/oti/issues/34#issuecomment-121717090.

jar398 commented 9 years ago

Thanks. ot_31 seems to fail the validator. This comes from the treebase converter, so the converter probably needs to change. If it fails to validate I don't know how it got into the repo. I'll copy this to a new peyotl issue. { "errors": { "MISSING_MANDATORY_KEY": { "data": [ "@otu" ], "refersTo": { "@idref": "Tn295377", "@nodeID": "Tn295377", "@top": "trees", "@treeID": "Tr4206", "@treesID": "Tb6619" } } }, ...

jar398 commented 9 years ago

The indexing script is using the ext/studies/graphdb/index_study call, which probably didn't get updated.

On Wed, Jul 15, 2015 at 3:33 PM, Cody Hinchliff notifications@github.com wrote:

There are multiple issues here. For study ot_232, I have no idea why it would return that error unless there really was already a study indexed with the same id. Maybe for some reason that study is getting indexed twice? Maybe one of the other studies has an internal identifier labeling it at ot_232 even though it is named something else?

The other errors can be retrieved using curl against the devapi OTI. They are below. Both ot_31 and ot_32 cause OTI to complain that there are leaf nodes which are not assigned to OTUs. Presuming that this is correct, it would seem that the solution here would be to fix the nexsons.

I am not sure why those errors are not showing up in the output of the index_nexsons script. I fixed some of the error reporting code earlier (neo4j uses the getMessage() method on the exception, if I remember correctly, so error messages are lost if they are returned any other way), but maybe I missed something, or maybe those changes haven't been propagated to api from devapi yet.


curl -X POST http://devapi.opentreeoflife.org/v2/studies/index_studies -H 'content-type:application/json' -d '{"urls": [" http://api.opentreoflife.org/phylesystem/v1/../default/v1/study/ot_31.json "]}'

{

"message" : "The node Tn295230 is identified as a leaf but has not been assigned an OTU.",

"exception" : "NexsonParseException",

"fullname" : "org.opentree.nexson.io.NexsonParseException",

"stacktrace" : [ "org.opentree.nexson.io.NexsonNode.parseNexson(NexsonNode.java:235)", "org.opentree.nexson.io.NexsonNode.(NexsonNode.java:48)", "org.opentree.nexson.io.NexsonTree.parseNexson(NexsonTree.java:204)", "org.opentree.nexson.io.NexsonTree.(NexsonTree.java:85)", "org.opentree.nexson.io.NexsonSource.parseNexson(NexsonSource.java:191)", "org.opentree.nexson.io.NexsonSource.(NexsonSource.java:52)", "org.opentree.oti.plugins.studies.readRemoteNexson(studies.java:286)", "org.opentree.oti.plugins.studies.index_studies(studies.java:189)", "java.lang.reflect.Method.invoke(Method.java:497)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)",

"org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)",

"org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)", "java.lang.reflect.Method.invoke(Method.java:497)",

"org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)" ]

}

curl -X POST http://devapi.opentreeoflife.org/v2/studies/index_studies -H 'content-type:application/json' -d '{"urls": [" http://api.opentreoflife.org/phylesystem/v1/../default/v1/study/ot_32.json "]}'

{

"message" : "The node Tn295230 is identified as a leaf but has not been assigned an OTU.",

"exception" : "NexsonParseException",

"fullname" : "org.opentree.nexson.io.NexsonParseException",

"stacktrace" : [ "org.opentree.nexson.io.NexsonNode.parseNexson(NexsonNode.java:235)", "org.opentree.nexson.io.NexsonNode.(NexsonNode.java:48)", "org.opentree.nexson.io.NexsonTree.parseNexson(NexsonTree.java:204)", "org.opentree.nexson.io.NexsonTree.(NexsonTree.java:85)", "org.opentree.nexson.io.NexsonSource.parseNexson(NexsonSource.java:191)", "org.opentree.nexson.io.NexsonSource.(NexsonSource.java:52)", "org.opentree.oti.plugins.studies.readRemoteNexson(studies.java:286)", "org.opentree.oti.plugins.studies.index_studies(studies.java:189)", "java.lang.reflect.Method.invoke(Method.java:497)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)",

"org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)",

"org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)", "java.lang.reflect.Method.invoke(Method.java:497)",

"org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)" ]

}

On Wed, Jul 15, 2015 at 12:09 PM Jonathan A Rees <notifications@github.com

wrote:

This is a major problem, top priority. I'll take a look at it; any help is welcome.

— Reply to this email directly or view it on GitHub <https://github.com/OpenTreeOfLife/oti/issues/34#issuecomment-121717090 .

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/oti/issues/34#issuecomment-121722147.

jar398 commented 9 years ago

I wonder if this can be done with a browser proxy configuration instead of changing /etc/hosts ...

On Wed, Jul 15, 2015 at 9:09 PM, Jonathan A Rees rees@mumble.net wrote:

The indexing script is using the ext/studies/graphdb/index_study call, which probably didn't get updated.

On Wed, Jul 15, 2015 at 3:33 PM, Cody Hinchliff notifications@github.com wrote:

There are multiple issues here. For study ot_232, I have no idea why it would return that error unless there really was already a study indexed with the same id. Maybe for some reason that study is getting indexed twice? Maybe one of the other studies has an internal identifier labeling it at ot_232 even though it is named something else?

The other errors can be retrieved using curl against the devapi OTI. They are below. Both ot_31 and ot_32 cause OTI to complain that there are leaf nodes which are not assigned to OTUs. Presuming that this is correct, it would seem that the solution here would be to fix the nexsons.

I am not sure why those errors are not showing up in the output of the index_nexsons script. I fixed some of the error reporting code earlier (neo4j uses the getMessage() method on the exception, if I remember correctly, so error messages are lost if they are returned any other way), but maybe I missed something, or maybe those changes haven't been propagated to api from devapi yet.


curl -X POST http://devapi.opentreeoflife.org/v2/studies/index_studies -H 'content-type:application/json' -d '{"urls": [" http://api.opentreoflife.org/phylesystem/v1/../default/v1/study/ot_31.json "]}'

{

"message" : "The node Tn295230 is identified as a leaf but has not been assigned an OTU.",

"exception" : "NexsonParseException",

"fullname" : "org.opentree.nexson.io.NexsonParseException",

"stacktrace" : [ "org.opentree.nexson.io.NexsonNode.parseNexson(NexsonNode.java:235)", "org.opentree.nexson.io.NexsonNode.(NexsonNode.java:48)", "org.opentree.nexson.io.NexsonTree.parseNexson(NexsonTree.java:204)", "org.opentree.nexson.io.NexsonTree.(NexsonTree.java:85)", "org.opentree.nexson.io.NexsonSource.parseNexson(NexsonSource.java:191)", "org.opentree.nexson.io.NexsonSource.(NexsonSource.java:52)", "org.opentree.oti.plugins.studies.readRemoteNexson(studies.java:286)", "org.opentree.oti.plugins.studies.index_studies(studies.java:189)", "java.lang.reflect.Method.invoke(Method.java:497)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)",

"org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)",

"org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)", "java.lang.reflect.Method.invoke(Method.java:497)",

"org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)" ]

}

curl -X POST http://devapi.opentreeoflife.org/v2/studies/index_studies -H 'content-type:application/json' -d '{"urls": [" http://api.opentreoflife.org/phylesystem/v1/../default/v1/study/ot_32.json "]}'

{

"message" : "The node Tn295230 is identified as a leaf but has not been assigned an OTU.",

"exception" : "NexsonParseException",

"fullname" : "org.opentree.nexson.io.NexsonParseException",

"stacktrace" : [ "org.opentree.nexson.io.NexsonNode.parseNexson(NexsonNode.java:235)", "org.opentree.nexson.io.NexsonNode.(NexsonNode.java:48)", "org.opentree.nexson.io.NexsonTree.parseNexson(NexsonTree.java:204)", "org.opentree.nexson.io.NexsonTree.(NexsonTree.java:85)", "org.opentree.nexson.io.NexsonSource.parseNexson(NexsonSource.java:191)", "org.opentree.nexson.io.NexsonSource.(NexsonSource.java:52)", "org.opentree.oti.plugins.studies.readRemoteNexson(studies.java:286)", "org.opentree.oti.plugins.studies.index_studies(studies.java:189)", "java.lang.reflect.Method.invoke(Method.java:497)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)",

"org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)",

"org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)", "java.lang.reflect.Method.invoke(Method.java:497)",

"org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)" ]

}

On Wed, Jul 15, 2015 at 12:09 PM Jonathan A Rees < notifications@github.com> wrote:

This is a major problem, top priority. I'll take a look at it; any help is welcome.

— Reply to this email directly or view it on GitHub <https://github.com/OpenTreeOfLife/oti/issues/34#issuecomment-121717090 .

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/oti/issues/34#issuecomment-121722147.

jar398 commented 9 years ago

Last message "I wonder ..." pertains to a different issue. Please ignore.

jar398 commented 9 years ago

For ot_31 and 32, it looks to me as if we can just replace the throw with a return null, to get out of our immediate predicament, pending resolution of https://github.com/OpenTreeOfLife/peyotl/issues/126. Sound right?

jar398 commented 9 years ago

Re ot_232, here's what I get:

curl -X POST http://localhost:7478/db/data/ext/studies/graphdb/index_studies -H 'content-type:application/json' -d '{"urls": ["http://api.opentreeoflife.org/phylesystem/v1/../default/v1/study/ot_232.json"]}' { "message" : "The specified root node Tr49052_ROOT is different from the observed root of the tree in the NexSON object hierarchy. This is nonsensical.", "exception" : "NexsonParseException", "fullname" : "org.opentree.nexson.io.NexsonParseException", "stacktrace" : [ "org.opentree.nexson.io.NexsonTree.parseNexson(NexsonTree.java:249)", "org.opentree.nexson.io.NexsonTree.(NexsonTree.java:85)", "org.opentree.nexson.io.NexsonSource.parseNexson(NexsonSource.java:191)", "org.opentree.nexson.io.NexsonSource.(NexsonSource.java:52)", "org.opentree.oti.plugins.studies.readRemoteNexson(studies.java:286)", "org.opentree.oti.plugins.studies.index_studies(studies.java:189)", "java.lang.reflect.Method.invoke(Method.java:483)", "org.neo4j.server.plugins.PluginMethod.invoke(PluginMethod.java:57)", "org.neo4j.server.plugins.PluginManager.invoke(PluginManager.java:168)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:300)", "org.neo4j.server.rest.web.ExtensionService.invokeGraphDatabaseExtension(ExtensionService.java:122)", "java.lang.reflect.Method.invoke(Method.java:483)", "org.neo4j.server.rest.security.SecurityFilter.doFilter(SecurityFilter.java:112)" ] }(my-venv)bash-3.2$

jar398 commented 8 years ago

I indexed production phylesystem a few days ago, and there were no failures. So this issue is no longer critical. But I'm not sure that means we should close the issue. The situations detected (and currently ignored) are of some importance. I think the right thing is to show them as warnings as indexing proceeds.