edgardomortiz / Captus

Assembly of Phylogenomic Datasets from High-Throughput Sequencing data
https://edgardomortiz.github.io/captus.docs/
GNU General Public License v3.0
18 stars 5 forks source link

captus generating empty files on HPC #4

Open korjent opened 9 months ago

korjent commented 9 months ago

Hi Team,

I am trying to run captus on HPC, 1 node, multiple cores, and basically not doing it on my own computer. All seems to be installed normally, works.. I clean, all good. Then assemble > and i get all empty folders, even truing the tutorial dataset. any advice?

(CAPTUS) [a1638383@p2-log-2 captus_test]$ captus_assembly assemble -r 01_clean_reads

Starting Captus-assembly: ASSEMBLE (2023-11-24 17:05:43) Welcome to the de novo assembly step of Captus-assembly. In this step, Captus will use MEGAHIT to assemble your input reads. It is also possible to subsample a number of reads using reformat.sh from BBTools prior to assembly, this is useful while performing tests or when including samples with considerably higher sequencing depth in a dataset. Since you provided a directory name, Captus will look in that location for all the FASTQ files that contain the string '_R1' in their names and match them with their respective '_R2' pairs. If the '_R2' can not be found, the sample is treated as single-end. Sample names are derived from the text found before the '_R1' string.The full set of reads per sample will be assembled, no subsampling will be performed. For more information, please see https://github.com/edgardomortiz/Captus

   Captus version: v1.0.0
          Command: /home/a1638383/mambaforge/envs/CAPTUS/bin/captus_assembly assemble -r 01_clean_reads
         Max. RAM: 248.7GB (out of 251.2GB)
     Max. Threads: 72 (out of 72)

     Dependencies:
          MEGAHIT: v1.2.9 OK
  megahit_toolkit: v1.2.9 OK
          BBTools: not used

 Python libraries:
            numpy: v1.26.0 OK
           pandas: v2.1.3 OK
           plotly: v5.18.0 OK

 Output directory: /hpcfs/users/a1638383/DeMux_Trim_runs/captus_test/02_assemblies
                   Output directory successfully created

Subsampling Reads with reformat.sh (2023-11-24 17:05:44) Now Captus will randomly subsample 0 read pairs (or single-end reads) from each sample prior to de novo assembly with MEGAHIT.

Skipping read subsampling step... (to enable provide a number of reads to subsample with '--sample_reads_target')

De Novo Assembly with MEGAHIT (2023-11-24 17:05:44) Now Captus will perform de novo assembly with your input reads using MEGAHIT. Both '--min_contig_len' (when set to 'auto') and '--k_list' will be adjusted sample-wise according to the mean read length of the sample's FASTQ files.

Concurrent assemblies: 4 RAM per assembly: 62.2GB Threads per assembly: 18

           preset: CAPSKIM
           k_list: 31,39,47,63,79,95,111,127,143,159,175
        min_count: 2
      prune_level: 2
      merge_level: 20,0.95
   min_contig_len: auto
    max_contig_gc: 100.0%
    extra_options: None
          tmp_dir: /home/a1638383/captus_megahit_tmp

  Overwrite files: False
   Keep all files: False

Samples to assemble: 4

Output directories: /hpcfs/users/a1638383/DeMux_Trim_runs/captus_test/02_assemblies/[Sample_name]__captus-asm/01_assembly A directory will be created for each sample

De novo assembling with MEGAHIT: 0%| | 0/4 [00:03<?, ?sample/s] └─→ De novo assembly completed for 4 sample(s) [3.546s]

Skipping summarization step... (no assembly statistics files were produced)

MEGAHIT temporary directory '/home/a1638383/captus_megahit_tmp' deleted

Captus-assembly: ASSEMBLE -> successfully completed [4.899s] (2023-11-24 17:05:48)

edgardomortiz commented 9 months ago

Well, it seems like a MEGAHIT issue: First, that workstation is a Mac? if so please check the note in the README about MEGAHIT version for Mac Second, if you still have the folder of the assemblies, could you attach here the MEGAHIT logs for one sample? (you can find them inside any sample's folder, megahit_brief.log, and megahit_full.log) Third, if you already erased the folder could you run Captus again using --debug to see more extensive error messages? (but also send me any sample's MEGAHIT logs) Finally it could also be an issue that you don't have enough free space in the HOME folder (but this is more unlikely for the test data)

Thanks,

Edgardo

korjent commented 9 months ago

Hi Edgar,

Thank you for your prompt reply! I ran it again with —debug I also ran the commands for mac you suggested, although the HPC is not Mac. I use the terminal on mac to access.

attached is the folder I got for the tutorial..

Hope this helps…

cheers

Kor 02_assemblies.zip

edgardomortiz commented 9 months ago

I see, then you reinstalled MEGAHIT and now it works? what version did you have before? I would like to replicate the issue to warn other users.

I guess with the new MEGAHIT you can now remove --debug and everything should work normally.

Edgardo

korjent commented 9 months ago

Hi Eduardo. I did try something mine and got this:

Starting Captus-assembly: ASSEMBLE (2023-12-05 15:48:23)  Welcome to the de novo assembly step of Captus-assembly. In this step, Captus will use MEGAHIT to assemble your input reads. It is also possible to subsample a number of reads using reformat.sh from BBTools prior to assembly, this is useful while performing tests or when including samples with considerably higher sequencing depth in a dataset.  Since you provided a directory name, Captus will look in that location for all the FASTQ files that contain the string '_R1' in their names and match them with their respective '_R2' pairs. If the '_R2' can not be found, the sample is treated as single-end. Sample names are derived from the text found before the '_R1' string.The full set of reads per sample will be assembled, no subsampling will be performed.  For more information, please see https://github.com/edgardomortiz/Captus

   Captus version: v1.0.0
          Command: /home/a1638383/mambaforge/envs/CAPTUS/bin/captus_assembly assemble -r 01_clean_reads
         Max. RAM: 186.7GB (out of 188.6GB)
     Max. Threads: 80 (out of 80)

     Dependencies:
          MEGAHIT: v1.2.9 OK
  megahit_toolkit: v1.2.9 OK
          BBTools: not used

 Python libraries:
            numpy: v1.26.0 OK
           pandas: v2.1.3 OK
           plotly: v5.18.0 OK

 Output directory: /hpcfs/users/a1638383/DeMux_Trim_runs/HybCap113/02_assemblies
                   Output directory successfully created

Subsampling Reads with reformat.sh (2023-12-05 15:48:24)  Now Captus will randomly subsample 0 read pairs (or single-end reads) from each sample prior to de novo assembly with MEGAHIT.

Skipping read subsampling step... (to enable provide a number of reads to subsample with '--sample_reads_target')

De Novo Assembly with MEGAHIT (2023-12-05 15:48:24)  Now Captus will perform de novo assembly with your input reads using MEGAHIT. Both '--min_contig_len' (when set to 'auto') and '--k_list' will be adjusted sample-wise according to the mean read length of the sample's FASTQ files.

Concurrent assemblies: 20 RAM per assembly: 9.3GB Threads per assembly: 4

           preset: CAPSKIM
           k_list: 31,39,47,63,79,95,111,127,143,159,175
        min_count: 2
      prune_level: 2
      merge_level: 20,0.95
   min_contig_len: auto
    max_contig_gc: 100.0%
    extra_options: None
          tmp_dir: /home/a1638383/captus_megahit_tmp

  Overwrite files: False
   Keep all files: False

Samples to assemble: 48

Output directories: /hpcfs/users/a1638383/DeMux_Trim_runs/HybCap113/02_assemblies/[Sample_name]__captus-asm/01_assembly A directory will be created for each sample

De novo assembling with MEGAHIT:

0%| | 0/48 [00:00<?, ?sample/s] 0%| | 0/48 [00:04<?, ?sample/s]  └─→ De novo assembly completed for 48 sample(s) [4.560s]

Skipping summarization step... (no assembly statistics files were produced)

MEGAHIT temporary directory '/home/a1638383/captus_megahit_tmp' deleted

Captus-assembly: ASSEMBLE -> successfully completed [5.982s] (2023-12-05 15:48:29)


korjent commented 9 months ago

Not sure if I actually updated megahit?

cheers

Kor