PathwayCommons / cpath2

Biological pathway data integration and access platform (Pathway Commons)
http://www.pathwaycommons.org/pc2/
MIT License
6 stars 5 forks source link

A graph query is too slow when *SIF format is requested. #231

Closed IgorRodchenkov closed 8 years ago

IgorRodchenkov commented 8 years ago

Example:

http://pathwaycommons.baderlab.org/graph?source=http://identifiers.org/uniprot/P20908&kind=neighborhood

But the following (either requesting EXTENDED_BINARY_SIF or BINARY_SIF format) takes too long: http://pathwaycommons.baderlab.org/graph?source=http://identifiers.org/uniprot/P20908&kind=neighborhood&format=BINARY_SIF

Whereas, "&format=GSEA" works fast.

IgorRodchenkov commented 8 years ago

I've just tried converting that BioPAX model (neighborhood of P20908, file) to SIF with paxtools.jar:

java -Xmx8g -jar paxtools.jar toSIFnx pc2v8_P20908_neighborhood.owl.gz P20908.sifnx.txt seqDb=uniprot

It took about 4 minutes.

PS: And it takes about 4:30 minutes to get the same result using PC2 web service. Which seems too slow to me...

ozgunbabur commented 8 years ago

I tried converting using below code and It took 2 minutes for me.

    SIFSearcher searcher = new SIFSearcher(new CommonIDFetcher(), SIFEnum.values());
    SimpleIOHandler io = new SimpleIOHandler(BioPAXLevel.L3);
    String dir = "C:\\Users\\babur\\Downloads\\";
    Model model = io.convertFromOWL(new FileInputStream(dir + "temp.owl"));
    searcher.searchSIF(model, new FileOutputStream(dir + "temp.sif"));

That is kind of OK considering it produces 43K interactions.

IgorRodchenkov commented 8 years ago

Why not to use the code in the PaxtoolsMain (toSIF) and also with using our blacklist.txt, which can make difference? But, anyway, this must be very similar if not same, and 2 min or 4 min can be due to differences between our machines.

I still think 2 minutes is a little bit not too fast though... Igor

On Feb 1, 2016, at 5:51 PM, Özgün Babur notifications@github.com wrote:

I tried converting using below code and It took 2 minutes for me.

SIFSearcher searcher = new SIFSearcher(new CommonIDFetcher(), SIFEnum.values());
SimpleIOHandler io = new SimpleIOHandler(BioPAXLevel.L3);
String dir = "C:\\Users\\babur\\Downloads\\";
Model model = io.convertFromOWL(new FileInputStream(dir + "temp.owl"));
searcher.searchSIF(model, new FileOutputStream(dir + "temp.sif"));

That is kind of OK considering it produces 43K interactions.

— Reply to this email directly or view it on GitHub.

IgorRodchenkov commented 8 years ago

We've made the SIF patterns search (Paxtools pattern) partially support multi-threading, and excluded 'NEIGHBOR_OF' type, etc., which has resulted in approx. 4x faster query execution... Looks, without deep and serious re-thinking and re-factoring of our SIF framework, it's hard to achieve more.

So, I am closing this issue for now. Let's re-open if needed.