B-UMMI / chewBBACA

BSR-Based Allele Calling Algorithm
GNU General Public License v3.0
134 stars 28 forks source link

AlleleCallEvaluator error: cgMLST is composed of 0 loci #198

Closed jgugliel closed 5 months ago

jgugliel commented 5 months ago

Hello!

I am getting a weird error with AlleleCallEvaluator (chewBBACA version 3.3.5) I performed an AlleleCall as usual. No error, the output files look perfectly normal. Then I run AlleleCallEvaluator and get this

Number of samples: 792 Number of loci: 1889 Computing sample statistics...done. Computing loci statistics...done. Provided annotations for 0 loci in the schema. Reading profile matrix...done. Masking profile matrix...done. Computing Presence-Absence matrix...done. Determining cgMLST loci... Computed for...792 genomes. cgMLST is composed of 0 loci

And then this error

/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/utils/distance_matrix.py:69: UserWarning:

genfromtxt: Empty input file: "<generator object tsv_to_nparray.. at 0x7fa77ce41c80>"

Creating distance matrix...Traceback (most recent call last): File "/opt/gensoft/exe/chewBBACA/3.3.5/bin/chewie", line 33, in sys.exit(load_entry_point('chewBBACA==3.3.5', 'console_scripts', 'chewie')()) File "/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/chewBBACA.py", line 1469, in main functions_info[process][1]() File "/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/utils/process_datetime.py", line 146, in wrapper func(*args, kwargs) File "/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/chewBBACA.py", line 693, in run_evaluate_calls evaluate_calls.main(vars(args)) File "/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/AlleleCallEvaluator/evaluate_calls.py", line 373, in main dm_file = dm.main(cgMLST_matrix_outfile, output_directory, File "/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/utils/distance_matrix.py", line 419, in main results = write_matrices(merged, genome_ids, output_pairwise, col_ids) File "/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/utils/distance_matrix.py", line 189, in write_matrices current_file = pickled_results[g] KeyError: 'isolate_100'

I never got this. I do not understand how chewie thinks that the cgMLST is composed of 0 loci. I had issues with this analysis and it was related to file names and contig names, so I renamed everything to be sure but the error persists.

Do you know what might cause this?

jgugliel commented 5 months ago

Ok if I understand correctly, the distance matrix and NJ tree are based on the "cgMLST100" set of loci. In my case there are none, hence the message "cgMLST is composed of 0 loci".

So I tried with the --light option and got a new error.

Traceback (most recent call last): File "/opt/gensoft/exe/chewBBACA/3.3.5/bin/chewie", line 33, in sys.exit(load_entry_point('chewBBACA==3.3.5', 'console_scripts', 'chewie')()) File "/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/chewBBACA.py", line 1469, in main functions_info[process][1]() File "/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/utils/process_datetime.py", line 146, in wrapper func(*args, kwargs) File "/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/chewBBACA.py", line 693, in run_evaluate_calls evaluate_calls.main(vars(args)) File "/opt/gensoft/exe/chewBBACA/3.3.5/venv/lib/python3.8/site-packages/chewBBACA-3.3.5-py3.8.egg/CHEWBBACA/AlleleCallEvaluator/evaluate_calls.py", line 492, in main "presence_absence": pa_data, UnboundLocalError: local variable 'pa_data' referenced before assignment

rfm-targa commented 5 months ago

Hello @jgugliel,

Thank you for your interest in chewBBACA. The traceback you shared in your second comment is due to a bug in the AlleleCallEvaluator module when using the --light, --no-pa, and --no-dm options. I've added some changes that should fix this issue. We will release chewBBACA v3.3.6 soon and expect it to solve the issue so that you can proceed with the analysis. Regarding the issue with the cgMLST composed of 0 loci, the most common cause is single or multiple samples with low quality in the dataset used to perform allele calling (e.g. highly fragmented genome assemblies, contamination). I advise checking the quality of the 792 genome assemblies if you haven't already. We will let you know when chewBBACA v3.3.6 is available.

Best regards,

Rafael

rfm-targa commented 5 months ago

Greetings @jgugliel,

We've released chewBBACA v3.3.6. It should fix the issue when using the --light option. If the cgMLST is composed of 0 loci, it should also skip the steps that compute the distance matrix, the MSA, and the tree. This way, it will create the report but not display the components that depend on cgMLST > 0. Please let us know if you can retry and if it works.

Best regards,

Rafael

jgugliel commented 5 months ago

Thanks a lot Rafael! I will try this new version as soon as possible and let you know if it worked.

jgugliel commented 5 months ago

The distance matrix, MSA and tree computations are indeed skipped, so everything is working as expected.

Thanks a lot!

Bests