Nextomics / NextDenovo

Fast and accurate de novo assembler for long reads
GNU General Public License v3.0
350 stars 52 forks source link

[blocksize] and [pa_correction] in run.cfg #109

Closed dexon9109 closed 3 years ago

dexon9109 commented 3 years ago

Hi, I have some problems below:

I put a parameter, such as “blocksize=1g” and "pa_correction=200" in run.cfg 。 But these two parameters are not in effect.The actual parameters in effect are "pa_correction=8"and "blocksize=15996475184" in pid46007.log.info。(P.S:parallel_jobs =8 )So I'm confused about the level at which these parameters work, I hope to get your help.

run.cfg:

[General]
job_type = local
job_prefix = test
task = all # 'all', 'correct', 'assemble'
rewrite = yes # yes/no
deltmp = yes
rerun = 3
parallel_jobs = 8
input_type = raw
read_type = clr
input_fofn = data3.fofn
workdir = workdir
#cluster_options = -l vf=10g,p=6 -q fat.q -S {bash} -w n #for sge

[correct_option]
read_cutoff = 1k
genome_size = 4620000000
blocksize = 1g
pa_correction = 200
seed_cutfiles = 200
sort_options = -m 12g -t 8 -k 40
minimap2_options_raw = -x ava-ont -t 8
correction_options = -p 8

[assemble_option]
random_round = 20
minimap2_options_cns = -x ava-ont -t 8 -k17 -w17
nextgraph_options = -a 1

pid46007.log.info:

rerun:                        3
task:                         all
deltmp:                       1
rewrite:                      1
read_type:                    clr
job_type:                     local
read_cutoff:                  1k
input_type:                   raw
parallel_jobs:                8
pa_correction:                8  #Sometimes this parameter is the same as  "parallel_jobs".I guess the reason is that 
    # “pa_correction”  is smaller than “parallel_jobs“.But there is a describtion which is "overwrite "parallel_jobs" 
    #only for this  step. " in <https://nextdenovo.readthedocs.io/en>.
random_round:                 20
seed_depth:                   45.0
seed_cutoff:                  22630
seed_cutfiles:                200
job_prefix:                   test
blocksize:                    15996475184    #How to calculate?
ctg_cns_options:              -p 8
genome_size:                  4620000000
nextgraph_options:            -a 1
minimap2_options_map:         -x map-ont
minimap2_options_raw:         -x ava-ont -t 8
sort_options:                 -m 12g -t 8 -k 40 -k 40
correction_options:           -p 8 -max_lq_length 10000 -min_len_seed 11315
minimap2_options_cns:         -x ava-ont -t 8 -k17 -w17 -k 17 -w 17 --minlen 2000 --maxhan1 5000

Thank you for your reading and I am looking forward to your reply :)

moold commented 3 years ago

Hi, if you do not set seed_cutoff, NextDenovo will calculate it and update options (such as blocksize) related to it. besides, pa_correction always should be much less than parallel_jobs, because pa_correction tasks required much more memory than parallel_jobs tasks. For some new users, illogical settings will crash the computer server, so NextDenovo will automatically adjust some parameters.

moold commented 3 years ago

I just found your config file has some errors, read_type = clr means your reads are PacBio date, while ava-ont and map-ont are only used for NanoPore reads. So, if you do not familiar with it, just set the required options, let NextDenovo set omitted parameters automatically.

dexon9109 commented 3 years ago

sorry,I didn't realize that error.Thank you for your correction;read_type= clr .ava-ont and map-ont will be changed into ava-pb and map-pb .However, I don't think this should have much impact on the overall results.The ONT's terms should be more lenient。