cannin / enhance_nlp_interaction_network_gsoc2020

3 stars 4 forks source link

Commit R Code for Reactome Analysis #18

Open cannin opened 4 years ago

cannin commented 4 years ago
PritiShaw commented 4 years ago

Some files mentioned in Rmd are to be downloaded from Reactome(https://reactome.org/download/current/), so will these files be manually uploaded by the person performing the task, or should I programmatically download them using wget in the notebook?

cannin commented 4 years ago

Please download with wget. And let me know when I can test.

PritiShaw commented 4 years ago

Hi Mentor I have updated the Rmd files with the wget commands, The build is done, you can test it at mybinder. I have tested it, you can find the HTML output here . For test, I ran on very small number of terms, so output wont be accurate

I came across a missing parameter at below line, I placed reactome_pathway_hierarchy_file https://github.com/cannin/enhance_nlp_interaction_network_gsoc2020/blob/0af442b360143e96f81b9a58cb2d1d9fa91e31fc/Reactome_Analysis.Rmd#L133 This was also missing in file you sent to Guanming in mail, can you please confirm if I have made the right change in this particular line. Thanks

cannin commented 4 years ago

I made changed a few things, but it is still not running. I do not know where the file with "INDRA_QUERY_TERM_STATEMENT_COUNT" exists online. This is concerning that files are too scattered in too many repositories resulting in something that is confusing. Can you just have one gist with multiple files? I thought it was here: https://gist.github.com/PritiShaw/6732c69bfbd4a169f7cdae448351d06e but it is not.

cannin commented 4 years ago

The file you had an issue with should be okay now.

PritiShaw commented 4 years ago

I made changed a few things, but it is still not running. I do not know where the file with "INDRA_QUERY_TERM_STATEMENT_COUNT" exists online. This is concerning that files are too scattered in too many repositories resulting in something that is confusing. Can you just have one gist with multiple files? I thought it was here: https://gist.github.com/PritiShaw/6732c69bfbd4a169f7cdae448351d06e but it is not.

Sorry for the confusion, the output file you are asking for is present here, Initially I was using gist to update the output file programmatically, but then I used the repository(PritiShaw/Reactome-Failed-Queries-Processing) so that it is easy to main.

The code generating this file is present is repository PritiShaw/Reactome-Failed-Queries, the same code is being used in Reactome_PMID_Metadata_Extraction.ipynb of this repository.

I understand that this redundancy of same code is creating a confusion, hence I have asked a question https://github.com/cannin/enhance_nlp_interaction_network_gsoc2020/issues/17#issuecomment-673118436 , the tests are ready and I will start pushing as soon as I receive a green signal

cannin commented 4 years ago

I made some changes that should help make the python code work. The next step needs to be to address #14 and put configuration files in a .env file for both Python and R.

PritiShaw commented 4 years ago

Hi Mentor I have updated the code, following are the outputs generated by MyBinder Rmd Output
indra_output.html

Sample parameter yml file parameters_sample.yml

I have tested the python notebooks as well for small data Requesting feedback

Thanks