Open hermeseduardo opened 6 years ago
Hi,
Yes, though dependent on the nature(read length/quality) and quantity of your input data, seeing a decrease in N50 from raw reads to corrected preads is typical, especially in a coverage limited situation. Long reads are often broken during the correction process in low coverage situations, resulting in an overall decrease in N50.
OK thanks. Do you know if there is anything that can be done to help? eg. reduce -e.70 to -e.60, or it may be 'bad' for the final assembly? I am also currently trying with the -b option for daligner, apparently it helps when there is compositional bias. pa_HPCdaligner_option = -vb .......... ovlp_HPCdaligner_option = -vb ........
Hi there, It is normal to loss considerable N50 length after the correction step?, in my case when from N50 13000 to N50 7000 I have about 40X of coverage bellow my pre_assembly_stats.json and fc_run.cfg I was suspecting of the GC content (30%) may affect DALIGNER, any clue regarding this?
thanks
{ "genome_length": 550000000, "length_cutoff": 1000, "preassembled_bases": 13073338353, "preassembled_coverage": 23.77, "preassembled_esize": 8290.033, "preassembled_mean": 4874.457, "preassembled_n50": 7271, "preassembled_p95": 12989, "preassembled_reads": 2682009, "preassembled_seed_fragmentation": 1.443, "preassembled_seed_truncation": 3720.872, "preassembled_yield": 0.583, "raw_bases": 22478005278, "raw_coverage": 40.869, "raw_esize": 14585.894, "raw_mean": 9948.45, "raw_n50": 13241, "raw_p95": 22760, "raw_reads": 2259448, "seed_bases": 22442989421, "seed_coverage": 40.805, "seed_esize": 14607.423, "seed_mean": 10139.719, "seed_n50": 13254, "seed_p95": 22877, "seed_reads": 2213374 }
[General] input_fofn = input.fofn input_type = raw length_cutoff = 1000 genome_size = 550000000 length_cutoff_pr = 10000
sge_option_da = --ntasks 1 --nodes 1 --cpus-per-task 8 --mem 30gb --time 5:30:00 sge_option_la = --ntasks 1 --nodes 1 --cpus-per-task 4 --mem 32gb --time 4:56:00 sge_option_cns = --ntasks 1 --nodes 1 --cpus-per-task 5 --mem 32gb --time 3:00:00 sge_option_pda = --ntasks 1 --nodes 1 --cpus-per-task 8 --mem 30gb --time 3:30:00 sge_option_pla = --ntasks 1 --nodes 1 --cpus-per-task 4 --mem 35gb --time 3:58:00 sge_option_fc = --ntasks 1 --nodes 1 --cpus-per-task 8 --mem 20gb --time 59:00
da_concurrent_jobs = 396 la_concurrent_jobs = 396 cns_concurrent_jobs = 396 pda_concurrent_jobs = 396 pla_concurrent_jobs = 396
pa_HPCdaligner_option = -v -B70 -t16 -e.70 -l1000 -s1000 ovlp_HPCdaligner_option = -v -B70 -t32 -h60 -e.96 -l500 -s1000
pa_DBsplit_option = -x500 -s120 ovlp_DBsplit_option = -x500 -s120
falcon_sense_option = --output_multi --min_idt 0.70 --min_cov 2 --max_n_read 200
overlap_filtering_setting = --max_diff 100 --max_cov 200 --min_cov 1 --bestn 1
skip_checks = true