PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

Cannot access the JSON-LD version of a pathway #319

Closed jvwong closed 3 months ago

jvwong commented 3 months ago

In the current v14 beta instance, the pathway with URI netpath:Pathway_TGF_beta_Receptor won't return JSON-LD (HTTP Status 500), but will return other formats OK.

To reproduce this error, call: https://beta.pathwaycommons.org/pc2/get?format=jsonld&uri=netpath:Pathway_TGF_beta_Receptor.

IgorRodchenkov commented 3 months ago

Digging... confirmed (a weird Jena error there... If it's not a Jena or our code bug, then I am afraid data intergation/merge went wrong somewhere, leaving e.g. spaces in URIs, and we are to fix the paxtools normalizer/merger and rebuild the pc14 data/model and index):

$ curl -X 'GET' 'https://beta.pathwaycommons.org/pc2/get?format=jsonld&uri=netpath:Pathway_TGF_beta_Receptor&message=true' {"timestamp":1718374871173,"status":500,"error":"Internal Server Error","message":"500; Internal Server Error - org.apache.jena.riot.RiotException: Bad character in IRI (space): <netpath:S[space]...>; [org.apache.jena.riot.system.ErrorHandlerFactory$ErrorHandlerStd.error(ErrorHandlerFactory.java:146), org.apache.jena.riot.lang.ReaderRIOTRDFXML$HandlerSink.convert(ReaderRIOTRDFXML.java:256), org.apache.jena.riot.lang.ReaderRIOTRDFXML$HandlerSink.convert(ReaderRIOTRDFXML.java:273), org.apache.jena.riot.lang.ReaderRIOTRDFXML$HandlerSink.statement(ReaderRIOTRDFXML.java:225), org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.triple(XMLHandler.java:72), org.apache.jena.rdfxml.xmlinput.impl.ParserSupport.triple(ParserSupport.java:233), org.apache.jena.rdfxml.xmlinput.states.WantDescription.startElement(WantDescription.java:92), org.apache.jena.rdfxml.xmlinput.impl.XMLHandler.startElement(XMLHandler.java:111), java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.startElement(AbstractSAXParser.java:518), java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(XMLNSDocumentScannerImpl.java:376), java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2726), java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605), java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:114), java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:542), java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:889), java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:825), java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141), java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1224), java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:637), org.apache.jena.rdfxml.xmlinput.impl.RDFXMLParser.parse(RDFXMLParser.java:101), org.apache.jena.rdfxml.xmlinput.ARP.load(ARP.java:118), org.apache.jena.riot.lang.ReaderRIOTRDFXML.parse(ReaderRIOTRDFXML.java:188), org.apache.jena.riot.lang.ReaderRIOTRDFXML.read(ReaderRIOTRDFXML.java:86), org.apache.jena.riot.RDFParser.read(RDFParser.java:353), org.apache.jena.riot.RDFParser.parseNotUri(RDFParser.java:343), org.apache.jena.riot.RDFParser.parse(RDFParser.java:292), org.apache.jena.riot.RDFParserBuilder.parse(RDFParserBuilder.java:540), org.apache.jena.riot.RDFDataMgr.parseFromInputStream(RDFDataMgr.java:901), org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:299), org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:273), org.apache.jena.riot.RDFDataMgr.read(RDFDataMgr.java:263), org.biopax.paxtools.io.jsonld.JsonldBiopaxConverter.convertToJsonld(JsonldBiopaxConverter.java:31), cpath.service.BiopaxConverter.convertToJsonLd(BiopaxConverter.java:106), cpath.service.BiopaxConverter.convert(BiopaxConverter.java:89), cpath.service.BiopaxConverter.convert(BiopaxConverter.java:135), cpath.service.ServiceImpl.convert(ServiceImpl.java:373), cpath.service.ServiceImpl.fetch(ServiceImpl.java:196), cpath.web.ApiControllerV1.fetchQuery(ApiControllerV1.java:68), cpath.web.ApiControllerV1.fetchQueryGet(ApiControllerV1.java:44), java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103), java.base/java.lang.reflect.Method.invoke(Method.java:580), org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:255), org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:188), org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118), org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:925), org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:830), org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87), org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089), org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979), org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014), org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:903), jakarta.servlet.http.HttpServlet.service(HttpServlet.java:527), org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885), jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614), org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:205), org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149), org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:51), org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174), org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149), org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109), org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116), org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174), org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149), org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201), org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116), org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174), org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149), org.springframework.web.filter.ForwardedHeaderFilter.doFilterInternal(ForwardedHeaderFilter.java:173), org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116), org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:174), org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:149), org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:167), org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:90), org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:482), org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:115), org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:93), org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:74), org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:344), org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:391), org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:63), org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:896), org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1744), org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:52), org.apache.tomcat.util.threads.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1191), org.apache.tomcat.util.threads.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:659), org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:63), java.base/java.lang.Thread.run(Thread.java:1583)]","path":"/pc2/get"}

IgorRodchenkov commented 3 months ago

The root cause is that there are elements like <bp:ModificationFeature rdf:about="netpath:S 312"> (with a space in the URI)...

Turns out, there're several "bad" URIs in the NetPath original data (old 2011), e.g., <bp:Protein rdf:about=" TAB2__9606__"> (with a space) vs. <bp:Protein rdf:about="TAB2__9606__">, which refer to the same protein type but likely mean different state (they participate in different Complexes, etc.) We did not notice any bug lke this in previous PC2 web service (v12), data due to the corresponding URIs (most of) were encoded/replaced with a hash...

Other such examples (paxtools does not fail on such, partially ignores, and so some biopax properties become dangling...)

NetPath_13.owl:<bp:Protein rdf:about=" TAB2__9606__">
NetPath_13.owl: <bp:component rdf:resource=" TAB2__9606__" />
NetPath_2.owl:<bp:Protein rdf:about=" HSPA1A__9606__Nucleus">
NetPath_2.owl: <bp:component rdf:resource=" HSPA1A__9606__Nucleus" />
NetPath_6.owl:<bp:Protein rdf:about=" PIK3R1__9606__Cytoplasm">
NetPath_6.owl: <bp:component rdf:resource=" PIK3R1__9606__Cytoplasm" />
NetPath_7.owl: <bp:component rdf:resource=" HDAC1__9606__Nucleus" />
NetPath_7.owl:<bp:Protein rdf:about=" HDAC1__9606__Nucleus">