edgardomortiz / Captus

Assembly of Phylogenomic Datasets from High-Throughput Sequencing data
https://edgardomortiz.github.io/captus.docs/
GNU General Public License v3.0
21 stars 5 forks source link

MMseqs2 error: TypeError: object of type 'NoneType' has no len() #10

Open kroeve opened 6 months ago

kroeve commented 6 months ago

Hi Edgardo,

When running captus_assembly extract -a 02_assemblies -o 03_extractions -n target_file.fasta -p SeedPlantsPTD -m SeedPlantsMIT -c on a hpc with Captus version: v1.0.1, the run terminates at step 1 of the clustering with following error message:

100%|████████████████████████████████| 816/816 [00:00<00:00, 884.17extraction/s]
100%|███████████████████████████████████| 272/272 [00:00<00:00, 3643.09sample/s]
Cannot close file /home/kro037/captus_mmseqs_tmp/123592248136754191/all_seqs.fasta
Traceback (most recent call last):
  File "/scratch3/kro037/.conda/envs/Captus/bin/captus_assembly", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/scratch3/kro037/.conda/envs/Captus/lib/python3.12/site-packages/captus/captus_assembly.py", line 1424, in main
    CaptusAssembly()
  File "/scratch3/kro037/.conda/envs/Captus/lib/python3.12/site-packages/captus/captus_assembly.py", line 90, in __init__
    getattr(self, args.command)()
  File "/scratch3/kro037/.conda/envs/Captus/lib/python3.12/site-packages/captus/captus_assembly.py", line 1074, in extract
    extract(full_command, args)
  File "/scratch3/kro037/.conda/envs/Captus/lib/python3.12/site-packages/captus/extract.py", line 636, in extract
    captus_cluster_refs = cluster_and_select_refs(num_samples, cl_min_samples,
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch3/kro037/.conda/envs/Captus/lib/python3.12/site-packages/captus/extract.py", line 2137, in cluster_and_select_refs
    with tqdm(total=len(clust1_clusters), ncols=tqdm_cols, unit="cluster") as pbar:
                    ^^^^^^^^^^^^^^^^^^^^
  TypeError: object of type 'NoneType' has no len()

The end of the log file looks like this:

► STEP 1 OF 3: Clustering contigs across samples with MMseqs2

       MMseqs2 method: easy-linclust
         cluster_mode: 2
          sensitivity: 7.5
           min_seq_id: 79.2
          seq_id_mode: 1
                  cov: 80
             cov_mode: 1
             gap_open: 3
           gap_extend: 1
          max_seq_len: 20000
              tmp_dir: /home/kro037/captus_mmseqs_tmp

    Min. locus length: 500
 Min. samples per cluster: 81
  Max. copies per cluster: 5
      Overwrite files: False
       Keep all files: False

   Using leftover contigs: 271
Using entire assembly: 1
 Total samples to cluster: 272

 Clustering directory: /scratch3/kro037/cookie/output/301_captus_default/03_extractions/02_clustering_data
                       Output directory already exists and files may be overwritten

WARNING: The input FASTA file for clustering was found in 
'/scratch3/kro037/cookie/output/301_captus_default/03_extractions/02_clustering_data/clustering_input.fasta' and it will 
be used, to recreate it enable '--overwrite'

Initial clustering of contigs at 79.2% identity:
 └─→ Clustering completed: [3m 4.3s (184.256s)]

Filtering clusters with fewer than 81 samples, more than 5 copies in average, and with centroids shorter than 500 bp:

I re-run it a few times, everytime the same error, in one run though the log file ended a few lines earlier (Initial clustering...). Any help appreciated! Thanks!

Cheers, Evelin

edgardomortiz commented 6 months ago

Hi Evelin,

The key is here:

Cannot close file /home/kro037/captus_mmseqs_tmp/123592248136754191/all_seqs.fasta

My guess is that you either don't have enough space in your home directory or you don't have the right permissions to write files. You can always redirect the temporary folder for clustering using --cl_tmp_dir to a location where you have enough space and permissions in your HPC.

I hope this helps...

Edgardo

kroeve commented 6 months ago

Hi Edgardo,

Thank you so much, I actually ran out of space! I set a temporary work directory, and it worked now. I do currently struggle a lot with unintentionally filling my home directory... :)

Cheers, Evelin

DavCP commented 3 months ago

Hello, I'm having a similar issue/output when running the -c flag at the step Selecting final cluster representatives:

Traceback (most recent call last): File "/Users/davidcr/miniconda3/envs/captus/bin/captus", line 10, in sys.exit(main()) ^^^^^^ File "/Users/davidcr/miniconda3/envs/captus/lib/python3.12/site-packages/captus/captus_assembly.py", line 1424, in main CaptusAssembly() File "/Users/davidcr/miniconda3/envs/captus/lib/python3.12/site-packages/captus/captus_assembly.py", line 90, in init getattr(self, args.command)() File "/Users/davidcr/miniconda3/envs/captus/lib/python3.12/site-packages/captus/captus_assembly.py", line 1074, in extract extract(full_command, args) File "/Users/davidcr/miniconda3/envs/captus/lib/python3.12/site-packages/captus/extract.py", line 636, in extract captus_cluster_refs = cluster_and_select_refs(num_samples, cl_min_samples, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/davidcr/miniconda3/envs/captus/lib/python3.12/site-packages/captus/extract.py", line 2208, in cluster_and_select_refs with tqdm(total=len(clust2_clusters), ncols=tqdm_cols, unit="cluster") as pbar: ^^^^^^^^^^^^^^^^^^^^ TypeError: object of type 'NoneType' has no len()

Same problem occurs on all 3 of my workstations. I'm pretty sure there is no problem with permissions or storage.

I really appreciate your time for the developing of the software.

edgardomortiz commented 3 months ago

Hi @DavCP,

My initial guess is that no clusters are found, therefore Captus fails at processing the output from MMSeqs. To be sure I would need the MMSeqs log, could you upload me the log file(s) found inside the clustering folder located within your extraction folder?

Thanks!

Edgardo