edgardomortiz / Captus

Assembly of Phylogenomic Datasets from High-Throughput Sequencing data
https://edgardomortiz.github.io/captus.docs/
GNU General Public License v3.0
20 stars 5 forks source link

'captus_assembly assemble' do not produced the assembled result file #9

Closed huang-baisha closed 6 months ago

huang-baisha commented 6 months ago

Hi Team, I am trying to run captus_assembly assemble, it no error reported, but the assembled result file is empty and just produced some intermediate files. captus_assembly assemble -r fastp -o assemble --threads 30 --ram 20 --overwrite

ls -lh fastp total 4.7G -rw-rw-r-- 1 hbs hbs 2.3G Apr 15 17:19 Niponia_nodulosa_R1.fq.gz -rw-rw-r-- 1 hbs hbs 2.5G Apr 15 17:19 Niponia_nodulosa_R2.fq.gz

Then, I used megahit and it could produce the assembled file. megahit -1 fastp/Niponia_nodulosa_R1.fq.gz -2 fastp/Niponia_nodulosa_R2.fq.gz -o megahit_out -t 80 ls -lh megahit_out total 400M -rw-rw-r-- 1 hbs hbs 262 Apr 16 10:37 checkpoints.txt -rw-rw-r-- 1 hbs hbs 0 Apr 16 10:37 done -rw-rw-r-- 1 hbs hbs 400M Apr 16 10:37 final.contigs.fa drwxrwxr-x 2 hbs hbs 4.0K Apr 16 10:37 intermediate_contigs -rw-rw-r-- 1 hbs hbs 166K Apr 16 10:37 log -rw-rw-r-- 1 hbs hbs 953 Apr 16 08:56 options.json

Please give me some help, thank you very much!

edgardomortiz commented 6 months ago

I am not sure what happened, I would have to see the MEGAHIT logs from the Captus run. From what I can see you are trying to assemble a sample with high coverage, right? If so, use preset WGS:

captus assemble -r fastp -o assemble --concurrent 1 --preset WGS --overwrite

However, if the data is genome skimming or target capture, or a combination of both, then the default preset is OK:

captus assemble -r fastp -o assemble --concurrent 1 --overwrite

The command you used for MEGAHIT is very different from what Captus would use, but I can guess that it failed because you restricted the RAM for Captus and you didn't for your MEGAHIT command. Also, you gave 80 threads in your command and only 30 for Captus.

Let me know if my suggestion works...

Edgardo

huang-baisha commented 6 months ago

I also run captus_assembly assemble -r fastp -o assemble --threads 80 --overwrite . RAM is "auto", but doesn't work. [Uploading megahit.brief.log…]()

edgardomortiz commented 6 months ago

Sorry, your attachment link didn't work, can you also upload the log from Captus itself?. Also by the way, you can abbreviate captus_assembly to simply captus, be sure to use the latest version (v1.0.1)

Edgardo

huang-baisha commented 6 months ago

megahit.brief.log megahit.full.log

edgardomortiz commented 6 months ago

Please also upload the Captus log, in the main assembly folder captus-assembly_assemble.log

huang-baisha commented 6 months ago

captus-assembly_assemble.log megahit.brief.log megahit.full.log Here are the results from the same set of data.

edgardomortiz commented 6 months ago

RAM was restricted on every Captus run (except the first one where you gave only 4 threads), also you gave only 8 threads, which is OK but extremely slow. Please don't restrict the RAM, or at least give it 32G, in such a computer I would try:

captus assemble -r fastp -o assemble --concurrent 1 --preset WGS --overwrite

or if you absolutely have to limit resources:

captus assemble -r fastp -o assemble --concurrent 1 --preset WGS --overwrite --threads 32 --ram 32

or better:

captus assemble -r fastp -o assemble --concurrent 1 --preset WGS --overwrite --threads 32 --ram 64

Let me know if this helps

Edgardo

huang-baisha commented 6 months ago

I will try it. Thanks for your help!

edgardomortiz commented 6 months ago

No problem, I would only use 4 threads and fewer than 8GB RAM with pretty small data (up to a few hundred MB, typical of target capture data).

Edgardo

huang-baisha commented 6 months ago

captus_assembly assemble -r fastp -o assemble --threads 80 --overwrite --tmp_dir /data/hbs/Niponia_nodulosa_DAH1/captus_assembly_tmp I run it successfully, thank you very much!

edgardomortiz commented 6 months ago

Excellent, glad to help! please don't hesitate to report new issues...

Edgardo

huang-baisha commented 6 months ago

Hi, I meet new problem. captus_assembly successfully, but the assembly.fasta is just 18M. ll Papilio_dialis_CAPTUSout/assemble/Papilio_dialis__captus-asm/01_assembly total 56M -rw-rw-r-- 1 hbs hbs 18M Apr 25 11:16 assembly.fasta -rw-rw-r-- 1 hbs hbs 37M Apr 25 11:16 assembly_graph.fastg -rw-rw-r-- 1 hbs hbs 755 Apr 25 11:16 assembly.stats.tsv -rw-rw-r-- 1 hbs hbs 131 Apr 25 11:16 assembly.stats.t.tsv -rw-rw-r-- 1 hbs hbs 4.1K Apr 25 11:16 megahit.brief.log -rw-rw-r-- 1 hbs hbs 188K Apr 25 11:16 megahit.full.log

But next step extract do not produce result file captus_assembly extract -a Papilio_dialis_CAPTUSout/assemble -n 07_mergeOG_cdsfasta/OG0006807.txt -o Papilio_dialis_CAPTUSout/extract/OG0006807 --threads 10

Please give me some help, thank you very much! captus-assembly_extract.log NUC_scipio_initial.log OG0006807.txt

edgardomortiz commented 6 months ago

Hi there,

Last time you showed me a different species. Maybe the data for this Papilio is not good, have you checked the quality filtering report? How many reads were assembled?, maybe you could show me the MEGAHIT logs...

Edgardo

huang-baisha commented 6 months ago

captus-assembly_assemble.log megahit.brief.log megahit.full.log

edgardomortiz commented 6 months ago

I don't see any anomaly, I'm afraid the data is to blame. If this is not capture data, then 37 million reads of WGS is not enough coverage to assemble anything meaningful for this species...