import/node_id question

joseluisdiaz commented 8 years ago

I'm trying to import http://genemania.org/data/current/Homo_sapiens/ using this pipeline. I have a question regarding the network definitions/identifiers, it seems to be String ( for example http://genemania.org/data/current/Homo_sapiens/Predicted.I2D-Ptacek-Snyder-2005-Yeast2Human.txt), and Generic2LuceneExporter expect a Long, what i'm doing wrong?

kzuberi commented 8 years ago

Hi, node id is an internal genemania id, this should be getting generated in an earlier pipeline step and is not something you need to provide in your input files. You may have hit a bug or some file format issue.

Those files you see in the genemania.org/data/ folder are a distant output of the pipeline (actually probably an earlier version), and the formats there may not be suitable to drop back in as inputs again.

But its not clear to me why you want to round-trip the data through. Are you just looking for a set of sample input files to the pipeline, or do you really want a binary genemania dataset for this particular set of networks? If the latter, you can probably get retrieve the binaries directly via cytoscape.

joseluisdiaz commented 8 years ago

My goal is to import all networks from HomoSapiens to a local copy that i have running of genemania. Since all information from Homo Sapiens (in genemania.org/data). I did a few things with the data from genemania.org/data trying to convert to the Input format of the pipeline (a few scripts):

Manualy generated organism.cfg based on Homo Sapiens.
I extracted network's conf files from http://genemania.org/data/current/Homo_sapiens/networks.txt and converted into data/network/direct/{human}.
used http://genemania.org/data/current/Homo_sapiens/identifier_mappings.txt for data/identifiers/symbols/identifier_mappings.txt and generated based on identifier_mappings.txt first column with value 'n/a' as a description for data/identifiers/descriptions/identifier_mappings.txt
Downloaded GoCategories for data/functions
for gene-attribute-list used gmt files and generated a txt without the second column, a .desc based on first and second column from the gmt file, and a simple cfg based mostly on the name of the file.

If you point into the snakemake task that should generate the id's i'll try to fix, but if you recomend that i should get the binaries using Cytoscape i'll do that. Will be great if you giveme a few directions.

kzuberi commented 8 years ago

Great that you are able to setup all this input data, its a fair bit of work to sort through! I'll be happy to help you troubleshoot, but let me make sure you know all your options first:

If you just want to use genemania locally, the preferred route is the genemania plugin for cytoscape. It won't give you the web interface, but has the equivalent functionality from within the cytoscape desktop app. You also gain access the command line tools which can help you run more complex tasks. Finally, and importantly, the cytoscape plugin has a UI for easily selecting and downloading datasets, as well as a way to incorporate your own networks to augment the core data.
if you really want to run the website with existing data, someone on the genemania team should be able to set you up with a vm that contains all the website components as well as the data. The disadvantage here compared to the cytoscape plugin is that you cannot setup your own organism, and any custom networks you upload to the existing organisms are transient and only last as long as your web session.
knowing all the above, if you really want to use the website (not plugin) with custom data, you probably need to get the pipeline running as you are doing now. To help with this, I suggest you first try quickly building a test dataset. That way we can rule out any problems with your toolchain setup, and you'll be able to examine the sample input data files to see examples of the formats. I notice i didn't commit test data to this pipeline project, but i was working on a test dataset a while back in my fork here, its in the test_data branch. Checkout that branch and try running:
```
snakemake --config test=1
```
it should run to completion without error. Look in the test/data/ subdirectory for input files and test/result/ for the processed outputs.

joseluisdiaz commented 8 years ago

Thank you so much, i'll try to download the data using genmania plugin. That will be my first option.

joseluisdiaz commented 8 years ago

Everthing works just fine, thanks! :-)

GeneMANIA / pipeline

import/node_id question #29