Closed man4ish closed 3 years ago
That's not easy to do. We have plans to develop a small Ondex plugin which would be able to invoke Groovy scripts to run automated tests against the OXL graph.
For the moment, we do a number of things:
The problem is a taxid mismatch between what you have set in your workflow.xml (fastagff parser --> taxid:4113) and what is defined in the knetminer poplar dataset (taxid:3694).
I think the poplar taxid is 3694, so it would be best to correct your workflow.xml and rebuild the KG. Otherwise also note that the poplar knetminer dataset is a fairly old and we may need to review the semantic motifs to ensure correctness but it should be ok as a proof of concept.
Can you please explain about the fields in compara.txt as there is no header info.
First file from tutorial_data/compara.txt (https://knetminer.com/tutorial/knetbuilder/tutorial-data.zip)
ATMG00030 ATMG00030.1 arabidopsis_thaliana 58.8785 ortholog_one2one PGSC0003DMG400019855 PGSC0003DMT400051118 solanum_tuberosum 42.2819 NULL NULL NULL 0.00 0 114308117
Also i found that above format is different from what is explained here https://github.com/Rothamsted/knetbuilder/wiki/Building-Knowledge-Networks#ensembl-compara-data
I have updated the wiki to match the tutorial data. The compara-config.xml file describes the columns that are parsed and transformed into a (Protein)-[:ortho]->(Protein)
graph. Which species are you interested in building a knowledge graph for? We maybe able to help.
I am interested in building knowledge-graph for Populus Trichocarpa v3.1 from Phytozome. Right now we are preparing data.
May i know what is the criteria for choosing orthologs, it is based on % identity (col 4 or col 9)?
Above is plot for % identity(column 4) for compara.txt (potato data) but there is no such distribution to find cutoff for predicting orthologs.
The compara data comes from a sophisticated Ensembl pipeline which is beyond my expertise. But my understanding is that all relations in the compara file are predicted orthologs (ie there is no need for further filtering). The sequence identity is something we show as additional evidence to the user but we don't use it as a filter or in our KnetScore.
Were you able to retrieve similar homology data for poplar from Phytozome? We could have a call to discuss your project requirements if you like.
It would be great to have a zoom meeting to discuss our project goals and the bottlenecks we are facing. Please let us know what time works for you and also where to send zoom meeting details. My email is : mkumar10@utk.edu
I'm closing this, possibly let's use the other channels to keep discussing the mentioned developments.
I have created poplar data based on potato tutorial data format. I am able to create oxl file using
./ondex-mini/runme.sh /var/www/html/knet/poplar_data/workflow.xml "baseDir=/var/www/html/knet/poplar_data"
All inputs and output (kg_final.oxl) are hosted at http://ec2-18-225-37-206.us-east-2.compute.amazonaws.com/knet/ temporarily.
How to verify kg_final.oxl is ok?
As I tried to query network using knetminer and did not work. Attached is log file(ws.log) for run.