Error with analysis graph when using taxon_id

KDeaton commented 2 years ago

Hi! I am running m2m with a singularity on a cluster. I get the following error when running m2m_analysis graph (below). I get the same error with other genomes too. Have you encountered this error before?

Edit: Actually I have an issue when running without using the taxon function as well. Enumeration runs fine. The html was created fine (as I've shared with you before, I was able to simplify it quite a bit by separating out individual pathways). But the svg was not created.

I think these are likely two separate issues:

maybe an issue with the taxon id version?
maybe an issue with the file path of the Oog.jar or powergraph programs with the singularity?

Error log for running analysis with --taxon:

Traceback (most recent call last): File "/usr/local/bin/m2m_analysis", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/metage2metabo/main_analysis__.py", line 288, in main main_analysis_workflow(network_dir, args.targets, args.seeds, args.out, args.taxon, File "/usr/local/lib/python3.8/dist-packages/metage2metabo/main_analysis.py", line 303, in main_analysis_workflow run_analysis_workflow(*allargs) File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/m2m_analysis_workflow.py", line 46, in run_analysis_workflow gml_output = graph_analysis(json_file_folder, target_folder_file, output_dir, taxon_file, taxonomy_level) File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/solution_graph.py", line 58, in graph_analysis create_gml(json_paths, target_paths, output_dir, taxonomy_output_file) File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/solution_graph.py", line 124, in create_gml key_species_data[target_category]['essential_symbionts'][taxon] = [organism for organism in key_species_types if key_species_types[organism] == 'ES' and taxon_named_species[organism].split('')[0] == taxon] File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/solution_graph.py", line 124, in key_species_data[target_category]['essential_symbionts'][taxon] = [organism for organism in key_species_types if key_species_types[organism] == 'ES' and taxon_named_species[organism].split('__')[0] == taxon] KeyError: 'GCA_900475215'

.................. Here is the end of the analysis workflow log for graph and powergraph sections:

######### Graph of targets_tolp1_sty ######### Number of nodes: 17 Number of edges: 70 --- Graph runtime 0.03 seconds ---

######### Graph compression: targets_tolp1_sty ######### Number of powernodes: 2 Number of poweredges: 1 Compression runtime 0.94 seconds ---

######### PowerGraph visualization: targets_tolp1_sty ######### ######### Creation of the powergraph website accessible at analysis/09analysis/html/targets_tolp1_sty ######### --- Powergraph runtime 1.05 seconds ---

ArnaudBelcour commented 2 years ago

Hi @KDeaton,

For 1. I think the issue comes from the reading of the taxon file. The KeyError indicates that the organism name (found by metage2metabo) is not found in the taxon_named_species dictionary created from the taxon file. Can you share some lines of the taxon file to see if there is a possible problem in this file (especially in the order of the columns)?

For 2. can you share the command used (with the singularity call)? The log you shared shows that the SVG creation has been bypassed as it was not given as input which is really strange.

KDeaton commented 2 years ago

Hi Arnaud, thanks for the prompt response as always!

For 1. I used the same column headers as the the taxon_id.tsv file in the tutorial. The taxonomy looks like it was read correctly in the taxonomy_species.tsv created during the m2m_analysis enumeration. a. The taxon_id.tsv (converted to csv for upload) taxon_id.csv

b. taxonomy_species.tsv from m2m_analysis taxonomy_species.csv

For 2. I did fix one thing, which was to direct to my original Oog.jar file in the Setup directory with the singularity. After this fix, an svg directory was created, but the process stopped after enumeration. Here is my updated sbatch script: singularity exec -B /hpc/group/deshusseslab:/hpc/group/deshusseslab/ Setup/metage2metabo-metacom_singularity_latest.sif m2m_analysis workflow -n genomes/taxcurated/ -s seeds/seeds_tol_sty3.sbml -t targets/final2 -o analysis/19analysis --taxon analysis/taxon_id.tsv --oog Setup/Oog.jar --level genus

error message was the same: Traceback (most recent call last): File "/usr/local/bin/m2m_analysis", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/metage2metabo/main_analysis__.py", line 288, in main main_analysis_workflow(network_dir, args.targets, args.seeds, args.out, args.taxon, File "/usr/local/lib/python3.8/dist-packages/metage2metabo/main_analysis.py", line 303, in main_analysis_workflow run_analysis_workflow(*allargs) File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/m2m_analysis_workflow.py", line 46, in run_analysis_workflow gml_output = graph_analysis(json_file_folder, target_folder_file, output_dir, taxon_file, taxonomy_level) File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/solution_graph.py", line 58, in graph_analysis create_gml(json_paths, target_paths, output_dir, taxonomy_output_file) File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/solution_graph.py", line 125, in create_gml key_species_data[target_category]['alternative_symbionts'][taxon] = [organism for organism in key_species_types if key_species_types[organism] == 'AS' and taxon_named_species[organism].split('')[0] == taxon] File "/usr/local/lib/python3.8/dist-packages/metage2metabo/m2m_analysis/solution_graph.py", line 125, in key_species_data[target_category]['alternative_symbionts'][taxon] = [organism for organism in key_species_types if key_species_types[organism] == 'AS' and taxon_named_species[organism].split('__')[0] == taxon] KeyError: 'GCA_900009125'

out file: 19analysisout.txt

ArnaudBelcour commented 2 years ago

Hi @KDeaton,

After testing the taxon_id file, I have found your issue: there was a space at the beginning of each organism ID in each row, such as:

species	taxon_id
' GCA_001630725'	303
' GCA_900009125'	85698

So the ID of an organism was ' GCA_000494915' and not 'GCA_000494915'. And when M2M tried to map the results of the enumeration with the dictionary from the taxonomy file, it failed.

I have removed the space from the file, can you try it? taxon_id_no_space.csv

For error 2, what was the path that you give for the Oog.jar before? It could be indeed a conflict with the known path of the Singularity that was not able to find the Oog.jar file.

KDeaton commented 2 years ago

That should be it! Thank you. I thought I had checked for that before, but I missed it. The initial rerun looks good, but now I need to add java to Path and rerun again, so I don't have final confirmation that was the issue, but seems likely.

KDeaton commented 2 years ago

Everything is working for me now! The taxon_id issue was resolved by eliminating the extra space. I resolved the java path issue by omitting "-oog filepathname/Oog.jar" in the m2m_analysis workflow command since that path is specified in the singularity recipe now.

AuReMe / metage2metabo

Error with analysis graph when using taxon_id #45