PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

A neigborhood query throws exception #219

Closed IgorRodchenkov closed 9 years ago

IgorRodchenkov commented 9 years ago

http://www.pathwaycommons.org/pc2/graph?source=P01112&kind=neighborhood&limit=2

Internal Server Error; Internal Server Error - org.biopax.paxtools.util.IllegalBioPAXArgumentException: I already have an object with the same ID: http://purl.org/pc2/7/RelationshipXref_protein+genbank+identifier_193786913. Try removing it first; [org.biopax.paxtools.impl.ModelImpl.add(ModelImpl.java:154), org.biopax.paxtools.impl.ModelImpl.addNew(ModelImpl.java:119), org.biopax.paxtools.controller.Cloner.clone(Cloner.java:49), cpath.service.CPathServiceImpl.getNeighborhood(CPathServiceImpl.java:277), cpath.webservice.BiopaxModelController.graphQuery(BiopaxModelController.java:242), sun.reflect.GeneratedMethodAccessor338.invoke(Unknown Source), sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43), java.lang.reflect.Method.invoke(Method.java:606), org.springframework.web.method.support.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:215), org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:132 ... etc. What happens is - Cloner.clone(Cloner.java:49) fails due to the 'elements' set (second parameter) for some unknown reason contains more than one biopax objects with rdf:ID="RelationshipXref_protein+genbank+identifier_193786913".

It's weird to see such exception... that's probably caused by some server failure, intermediate state, previous OutOfMemory exception...

ozgunbabur commented 9 years ago

I tried the same query using paxtools on the largest v7 owl, and it worked without errors. We'll need to track this bug in cpath2.

IgorRodchenkov commented 9 years ago

Ok, thanks Ozgun..

Using the test PC2 server, on the other run of the same query, I got: Exception: org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError: Requested array size exceeds VM limit...

Looks, the query returns result (biopax), but the cpath2 then fails to write it to RDF/XML using SimpleIOHandler.convertToOWL...

(in the console logs, it prints - ) org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet [cpath2] in context with path [] threw exception [Handler processing failed; nested exception is java.lang.OutOfMemoryError: Requested array size exceeds VM limit] with root cause java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:129) at java.io.BufferedWriter.write(BufferedWriter.java:230) at java.io.Writer.write(Writer.java:157) at org.biopax.paxtools.io.SimpleIOHandler.writeStatementFor(SimpleIOHandler.java:672) at org.biopax.paxtools.io.SimpleIOHandler.writeObject(SimpleIOHandler.java:615) at org.biopax.paxtools.io.SimpleIOHandler.writeObjects(SimpleIOHandler.java:630) at org.biopax.paxtools.io.SimpleIOHandler.convertToOWL(SimpleIOHandler.java:576) at cpath.service.BiopaxConverter.convert(BiopaxConverter.java:94) at cpath.service.BiopaxConverter.convert(BiopaxConverter.java:150) at cpath.service.CPathServiceImpl.convert(CPathServiceImpl.java:375) at cpath.service.CPathServiceImpl.getNeighborhood(CPathServiceImpl.java:271) at cpath.webservice.BiopaxModelController.graphQuery(BiopaxModelController.java:242) ...

Increased the JVM heap size from 32 to 48Gb - same OutOfMemory thing...

Igor

On Jul 24, 2015, at 3:26 PM, Özgün Babur notifications@github.com wrote:

I tried the same query using paxtools on the largest v7 owl, and it worked without errors. We'll need to track this bug in cpath2.

— Reply to this email directly or view it on GitHub.

IgorRodchenkov commented 9 years ago

See this comment in the cpath2 source code.

I think, we simply should never allow users execute PC2 neighborhood graph queries with limit>1 (at least never return the result as BioPAX)...

Does it make sense to return >1Gb RDF/XML data from a simple query like: /graph?source=P01112&kind=neighborhood&limit=2 that also would take several minutes to complete?

Ozgun successfully tried the same data/query outside PC2 (barring blacklist.txt) and got 1.3Gb output BioPAX file. So, while it works with a file output stream (handler.convertToOWL(ex, new FileOutputStream("temp.owl"));), it does not - in PC2 that writes the result biopax model to a byte array in the web app/servlet context to return the OWL data to the web user...

Alternatively, we have to re-design and re-factor how PC2 returns results; i.e., use tmp files, queues, etc...

IR.

On Fri, Jul 24, 2015 at 3:26 PM, Özgün Babur notifications@github.com wrote:

I tried the same query using paxtools on the largest v7 owl, and it worked without errors. We'll need to track this bug in cpath2.

— Reply to this email directly or view it on GitHub https://github.com/PathwayCommons/cpath2/issues/219#issuecomment-124657179 .

ozgunbabur commented 9 years ago

If we will limit the limit, then it may make sense to do it when the direction is "undirected" (currently this is the default for neighborhood). If we make the query directed by setting the direction to "bothstream", the result size reduces to 69.9MB, almost half the size.

While 2 neighborhood of HRAS is too big, it is not the case for every protein. If someone is working on a less known protein, and if its neighborhood is so small, she will instantly query for a bigger neighborhood. Not being able to do that may be annoying.

On Fri, Jul 24, 2015 at 5:21 PM, Igor Rodchenkov notifications@github.com wrote:

I think, we simply should never allow users execute PC2 neighborhood graph queries with limit>1 (at least never return the result as BioPAX)...

Does it make sense to return >1Gb RDF/XML data from a simple query like: /graph?source=P01112&kind=neighborhood&limit=2 that also would take several minutes to complete?

Ozgun successfully tried the same data/query outside PC2 (barring blacklist.txt) and got 1.3Gb output BioPAX file. So, while it works with a file output stream (handler.convertToOWL(ex, new FileOutputStream("temp.owl"));), it does not - in PC2 that writes the result biopax model to a byte array in the web app/servlet context to return the OWL data to the web user...

Alternatively, we have to re-design and re-factor how PC2 returns results; i.e., use tmp files, queues, push notifications, etc... Argh....

IR.

On Fri, Jul 24, 2015 at 3:26 PM, Özgün Babur notifications@github.com wrote:

I tried the same query using paxtools on the largest v7 owl, and it worked without errors. We'll need to track this bug in cpath2.

— Reply to this email directly or view it on GitHub < https://github.com/PathwayCommons/cpath2/issues/219#issuecomment-124657179

.

— Reply to this email directly or view it on GitHub https://github.com/PathwayCommons/cpath2/issues/219#issuecomment-124728481 .

IgorRodchenkov commented 9 years ago

I could both a) limit the 'limit' when the 'direction' is "undirected" (default) and b) slightly refactor inside cpath2 to use and pass internally temporary files instead of byte[]... By the way, ~70Mb is not half the size of 1.3Gb ;)

On Mon, Jul 27, 2015 at 4:40 PM, Özgün Babur notifications@github.com wrote:

If we will limit the limit, then it may make sense to do it when the direction is "undirected" (currently this is the default for neighborhood). If we make the query directed by setting the direction to "bothstream", the result size reduces to 69.9MB, almost half the size.

While 2 neighborhood of HRAS is too big, it is not the case for every protein. If someone is working on a less known protein, and if its neighborhood is so small, she will instantly query for a bigger neighborhood. Not being able to do that may be annoying.

On Fri, Jul 24, 2015 at 5:21 PM, Igor Rodchenkov <notifications@github.com

wrote:

I think, we simply should never allow users execute PC2 neighborhood graph queries with limit>1 (at least never return the result as BioPAX)...

Does it make sense to return >1Gb RDF/XML data from a simple query like: /graph?source=P01112&kind=neighborhood&limit=2 that also would take several minutes to complete?

Ozgun successfully tried the same data/query outside PC2 (barring blacklist.txt) and got 1.3Gb output BioPAX file. So, while it works with a file output stream (handler.convertToOWL(ex, new FileOutputStream("temp.owl"));), it does not - in PC2 that writes the result biopax model to a byte array in the web app/servlet context to return the OWL data to the web user...

Alternatively, we have to re-design and re-factor how PC2 returns results; i.e., use tmp files, queues, push notifications, etc... Argh....

IR.

On Fri, Jul 24, 2015 at 3:26 PM, Özgün Babur notifications@github.com wrote:

I tried the same query using paxtools on the largest v7 owl, and it worked without errors. We'll need to track this bug in cpath2.

— Reply to this email directly or view it on GitHub <

https://github.com/PathwayCommons/cpath2/issues/219#issuecomment-124657179

.

— Reply to this email directly or view it on GitHub < https://github.com/PathwayCommons/cpath2/issues/219#issuecomment-124728481

.

— Reply to this email directly or view it on GitHub https://github.com/PathwayCommons/cpath2/issues/219#issuecomment-125336159 .

ozgunbabur commented 9 years ago

Oh. yes it is not half, it is 19 time smaller :)

On Mon, Jul 27, 2015 at 4:46 PM, Igor Rodchenkov notifications@github.com wrote:

I could both a) limit the 'limit' when the 'direction' is "undirected" (default) and b) slightly refactor inside cpath2 to use and pass internally temporary files instead of byte[]... By the way, ~70Mb is not half the size of 1.3Gb ;)

On Mon, Jul 27, 2015 at 4:40 PM, Özgün Babur notifications@github.com

wrote:

If we will limit the limit, then it may make sense to do it when the direction is "undirected" (currently this is the default for neighborhood). If we make the query directed by setting the direction to "bothstream", the result size reduces to 69.9MB, almost half the size.

While 2 neighborhood of HRAS is too big, it is not the case for every protein. If someone is working on a less known protein, and if its neighborhood is so small, she will instantly query for a bigger neighborhood. Not being able to do that may be annoying.

On Fri, Jul 24, 2015 at 5:21 PM, Igor Rodchenkov < notifications@github.com

wrote:

I think, we simply should never allow users execute PC2 neighborhood graph queries with limit>1 (at least never return the result as BioPAX)...

Does it make sense to return >1Gb RDF/XML data from a simple query like: /graph?source=P01112&kind=neighborhood&limit=2 that also would take several minutes to complete?

Ozgun successfully tried the same data/query outside PC2 (barring blacklist.txt) and got 1.3Gb output BioPAX file. So, while it works with a file output stream (handler.convertToOWL(ex, new FileOutputStream("temp.owl"));), it does not - in PC2 that writes the result biopax model to a byte array in the web app/servlet context to return the OWL data to the web user...

Alternatively, we have to re-design and re-factor how PC2 returns results; i.e., use tmp files, queues, push notifications, etc... Argh....

IR.

On Fri, Jul 24, 2015 at 3:26 PM, Özgün Babur <notifications@github.com

wrote:

I tried the same query using paxtools on the largest v7 owl, and it worked without errors. We'll need to track this bug in cpath2.

— Reply to this email directly or view it on GitHub <

https://github.com/PathwayCommons/cpath2/issues/219#issuecomment-124657179

.

— Reply to this email directly or view it on GitHub <

https://github.com/PathwayCommons/cpath2/issues/219#issuecomment-124728481

.

— Reply to this email directly or view it on GitHub < https://github.com/PathwayCommons/cpath2/issues/219#issuecomment-125336159

.

— Reply to this email directly or view it on GitHub https://github.com/PathwayCommons/cpath2/issues/219#issuecomment-125337427 .

IgorRodchenkov commented 9 years ago

This is now fixed in the sources and test PC2 server (though, I did not limit the 'limit' parameter value to 1 when the 'direction' is "undirected", but instead made use of temporary files internally to carry query result data inside cpath2).