edgardomortiz / Captus

Assembly of Phylogenomic Datasets from High-Throughput Sequencing data
https://edgardomortiz.github.io/captus.docs/
GNU General Public License v3.0
18 stars 5 forks source link

Extracting Target Sequences error: ZeroDivisionError: division by zero #13

Open neolycus23 opened 1 month ago

neolycus23 commented 1 month ago

Hello,

First of all, I wanted to congratulate you for such an amazing pipeline, I am really enjoying it and looking forward to implementing it for my future papers.

My problem right now: I am trying to extract UCEs of my assemblies but I keep having the same error below (see attached the full log). It looks like Scipio only manages to finish some of my samples (22 out of 28 samples). It also happens to be that the files that are failing are the largest assembly files (> 1Gb each). Any ideas what could be causing this issue? A couple of days ago Scipio managed to finish the same dataset for 24 out of 28 samples, but it crashed after that. Thank you for your help!

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/concurrent/futures/process.py", line 263, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/site-packages/captus/extract.py", line 1339, in scipio_coding
    final_models = scipio_yaml_to_dict(yaml_final_file, min_score, min_identity,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/site-packages/captus/bioformats.py", line 2321, in scipio_yaml_to_dict
    model = parse_model(yaml[prot][yaml_model],
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/site-packages/captus/bioformats.py", line 2304, in parse_model
    mismatch_rate      = len(set(mod["mismatches"])) / prot_len_matched
                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~
[nohup.txt](https://github.com/user-attachments/files/16034555/nohup.txt)"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/dmz/home/vferreira/.conda/envs/captus/bin/captus_assembly", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/site-packages/captus/captus_assembly.py", line 1424, in main
    CaptusAssembly()
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/site-packages/captus/captus_assembly.py", line 90, in __init__
    getattr(self, args.command)()
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/site-packages/captus/captus_assembly.py", line 1074, in extract
    extract(full_command, args)
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/site-packages/captus/extract.py", line 505, in extract
    tqdm_parallel_nested_run(scipio_coding, scipio_params, d_msg, f_msg,
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/site-packages/captus/misc.py", line 158, in tqdm_parallel_nested_run
    result = future.result()
             ^^^^^^^^^^^^^^^
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/dmz/home/vferreira/.conda/envs/captus/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
ZeroDivisionError: division by zero
neolycus23 commented 1 month ago

A quick update: I run into the same issue with a single sample, a large genome assembly of 3.78 GB...

edgardomortiz commented 3 weeks ago

Hi @neolycus23 ,

Sorry for the late reply, I was in the chaos of moving countries. If I understand correctly UCEs are not necessarily translatable to protein right? then I would suggest providing your references as miscellaenous DNA (-d option)

Otherwise, please upload your exact command, the logs produced by Captus, and if possible your reference targets so I can start diagnosing the problem

Thanks!

Edgardo

neolycus23 commented 2 weeks ago

Hi Edgar,

Now it is my turn to apologize for the delay in responding you. UCEs are not necessarily translatable to proteins, but ours are.

I have used Captus and the command "captus_assembly extract -a 02_assemblies -n probes.fasta" (please, see the log file) for several samples, and it worked for almost of all of them. Captus crashed during the extraction stage for some of my larger genome assembly files (>1 GB). I just kept trying and rerunning the same command, and it worked for some of the failed samples, but some are persistently failling. I am attaching the log with the error that I mentioned above, and the NUC_scipio final log of one of the samples that failed to complete.

I am looking forward to hearing from you! Thanks for your help!

Vinicius captus_log.txt NUC_scipio_final.log