AppliedBioinformatics / runBNG

An easy way to run BioNano genomic analysis
MIT License
27 stars 7 forks source link

ERROR: Contig count 0 <= Assembly minimum 0, exiting. When running denovo #31

Open cdmmoeller opened 3 years ago

cdmmoeller commented 3 years ago

Hi,

When trying to run the following command with version 2.0.1

$BNG_DIR/runBNG denovo \
    -t      /BiO/Access/cdmm92/resource/bionano/tools/pipeline/Solve3.6.1_11162020/RefAligner/11442.11643rel \
    -s      /BiO/Access/cdmm92/resource/bionano/tools/pipeline/Solve3.6.1_11162020 \  
    -b      $ASCABER_DIR/Resources/Raw/Bionano/20200616_A_scaber_chamchui__RawMolecules.bnx \  
    -T      80 \  
    -j      78 \  
    -z      6000 \  
    -o      $ASCABER_DIR/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/denovo

I get the following error at the assembly stage:

Executing stage number 3 : Assembly

 Starting Multi-Threaded Process: Assembly
  Running 1 jobs with 80 threads, sleepTime=0.01
   START    1:                       Assembly,  77 Thr,    1 R,    1 T,    0 F,    0 Q
   STOP     1:                       Assembly,  77 Thr,    0 R,    1 T,    1 F,    0 Q  TotalTime= 0h 23.99m  RunTime= 0h 24.00m  CPUload=2002% host=NA Command exited with non-zero status 1
 Finished Multi-Threaded Process: Assembly

Thu Jul  8 03:06:31 2021: calling os.listdir(/BiO/Research/Project1/UNIST-Doellingeria_scabra-Genome-2020-12/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/denovo/contigs/exp_unrefined)
Thu Jul  8 03:06:31 2021: loaded 1 filenames from /BiO/Research/Project1/UNIST-Doellingeria_scabra-Genome-2020-12/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/denovo/contigs/exp_unrefined
Thu Jul  8 03:06:31 2021: Found 0 CMAP files in /BiO/Research/Project1/UNIST-Doellingeria_scabra-Genome-2020-12/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/denovo/contigs/exp_unrefined : writing list to /BiO/Research/Project1/UNIST-Doellingeria_scabra-Genome-2020-12/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/denovo/contigs/curContigs
Thu Jul  8 03:06:31 2021: wrote list to /BiO/Research/Project1/UNIST-Doellingeria_scabra-Genome-2020-12/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/denovo/contigs/curContigs
 Starting Multi-Threaded Process: Cmap_Merge_exp_unrefined
  Running 1 jobs with 1 threads, sleepTime=0.01
   START    1:            mrg_exp_unrefined_0,  16 Thr,    1 R,    1 T,    0 F,    0 Q
   STOP     1:            mrg_exp_unrefined_0,  16 Thr,    0 R,    1 T,    1 F,    0 Q  TotalTime= 0h 0.00m  RunTime= 0h 0.00m  CPUload=60% host=NA
 Finished Multi-Threaded Process: Cmap_Merge_exp_unrefined

ERROR: Contig count 0 <= Assembly minimum 0, exiting

Warning/Error messages:
('warning', 'Missing end marker in \\"/BiO/Research/Project1/UNIST-Doellingeria_scabra-Genome-2020-12/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/denovo/contigs/exp_unrefined/exp_unrefined.stdout\\" (found \\"y unavailable\\\\n\\" while expecting \\"END of output\\")\\n')
('error', 'job has not completed, see stdout=\\"/BiO/Research/Project1/UNIST-Doellingeria_scabra-Genome-2020-12/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/denovo/contigs/exp_unrefined/exp_unrefined.stdout\\"')
('critical', 'stage Assembly did not produce minimum number of contigs')

Warning/Error summary:
        1 warning(s)
        1 critical(s)
        1 error(s)

WARNING: missing xmap file: /BiO/Research/Project1/UNIST-Doellingeria_scabra-Genome-2020-12/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/denovo/contigs/exp_refineFinal1_sv/merged_smaps/exp_refineFinal1_merged.xmap

  Pipeline end time: Thu Jul  8 03:06:32 2021
  Elapsed time: 2288.23m; 38.14h; 1.59d

Pipeline has failed

I am attaching the full stdout.

Could you please help me solve this issue?

Many thanks, Christian bionano_denovo.o238785.txt

yyx8671 commented 3 years ago

Hi @cdmmoeller,

From the log file, it seems that the molecule length of your OM data is short. You may try to decrease the settings of the minimum molecule length to ensure there is sufficient data coverage. If you have a draft assembly, you can also try to use it to adjust the parameters by running 'runBNG MQR'.

Cheers, Andy

cdmmoeller commented 3 years ago

Hi Andy,

Thanks for your reply.

I tried decreasing the minimum molecule length (-l) to 40kb and minimum number of labels per molecule (-m) to 6, as well as using the parameters output by runBNG MQR. This is my command:

$BNG_DIR/runBNG denovo \
-P saphyr \
-H no \
-e yes \
-t /BiO/Access/cdmm92/resource/bionano/tools/pipeline/Solve3.6.1_11162020/RefAligner/11442.11643rel \
-s /BiO/Access/cdmm92/resource/bionano/tools/pipeline/Solve3.6.1_11162020 \
-b $ASCABER_DIR/Resources/Raw/Bionano/20200616_A_scaber_chamchui__RawMolecules.bnx \
-T 80 \
-l 40 \
-m 6 \
-j 60 \
-z 6000 \
-r $ASCABER_DIR/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/fa2cmap/smartdenovo.nextpolish.curated.0_37_175_DLE1_20kb_5labels.cmap \
-a 17 \
-p 1.88 \
-n 0.15 \
-d -0.05 \
-f 0.12 \
-R 0.03 \
-L 640 \
-S 8 \
-o $ASCABER_DIR/Analysis/bionano/smartdenovo.nextpolish.curated.0_37_175/denovo

However, the same error appears. I've attached the stdout. bionano_denovo.o238892.txt

MQR stdout: bionano_MQR.o238775.txt

Do you have any other potential solutions to this issue?

Thanks again, Christian

yuxuanyuan commented 3 years ago

Hi @cdmmoeller,

From the MQR, the mapping rate is pretty low, do you know the reason? Or can you please decrease the molecule length in the MQR test and see if the parameter changes?

Cheers, Andy

cdmmoeller commented 3 years ago

Hi @yuxuanyuan ,

Thanks for taking the time to help.

In MQR, I've tried changing the molecule length from the default 150 to: 50, 90 and 200. It seems that the shorter the molecule lengths, the lower the mapping rate (from 4.32% at -s 50 to 12.3% for -s 90). The other parameters are practically unchanged. MQR_s90.stdout.txt MQR_s150.stdout.txt MQR_s200.stdout.txt MQR_s50.stdout.txt

Could the low mapping rate be caused by the quality of the reference?

Following long-read assembly it's been polished and haplotig-purged. Some basic stats: G: 4.59Gb, Contigs: 34,571, N50: 206.8kb. The BUSCO score is C:86.4%[S:75.5%,D:10.9%],F:2.2%,M:11.4%,n:1614.

The haploid genome size has previously been estimated to be ~6G by flow cytometry and k-mer analysis.

Thanks, Christian

yyx8671 commented 3 years ago

Thanks @cdmmoeller for the info.

I guess one of the reasons is that the assembly is fragmented and the contigs are short. Another reason might be that the quality of the OM data is not good. We recently had an OM experiment and due to some elements binding on DNA, which were not fully removed, the nanochannel was partially blocked when producing the data. Few DNA could pass the channel, which led to the quality of data be bad. In our case, the mapping rate is as low as yours.

EugeneKim76 commented 3 years ago

Hi @yyx8671

Hope your work goes well. We are suffering from similar problems in this issue. I guess our problems are originated from some elements which blocked the nanochannel as in your comments. If it is possible, could you recommend some methods (or related papers) for removing such elements binding on DNA?

Best,