A gap-closing software tool that uses error-prone long reads generated by third-generation-sequence techniques (Pacbio, Oxford Nanopore, etc.) or preassembled contigs to fill N-gap in the genome assembly.
Both raw reads and pre-error-corrected reads are acceptable as input.
If only raw long reads are provided, it polishes raw TGS reads by calling Racon.
If additional NGS short reads are available, it polishes raw TGS reads by calling Pilon.
Notice: only fasta format of TGS reads is acceptable.
If you use TGS-GapCloser in your work, please cite: TGS-GapCloser: A fast and accurate gap closer for large genomes with low coverage of error-prone long reads Mengyang Xu, Lidong Guo, Shengqiang Gu, Ou Wang, Rui Zhang, Brock A Peters, Guangyi Fan, Xin Liu, Xun Xu, Li Deng, Yongwei Zhang GigaScience, Volume 9, Issue 9, 1 September 2020, giaa094, https://doi.org/10.1093/gigascience/giaa094
git clone https://github.com/BGI-Qingdao/TGS-GapCloser.git YOUR-INSTALL-DIR
rm -rf YOUR-INSTALL-DIR/minimap2
ln -s MINIMAP2-PATH YOUR-INSTALL-DIR/
cd YOUR-INSTALL-DIR
git submodule init
git submodule update
cd YOUR-INSTALL-DIR
make
conda install -c bioconda tgsgapcloser
if your install by conda, please install minimap2 first and make sure that minimap2 is available in your environment.
Usage:
tgsgapcloser --scaff SCAFF_FILE --reads TGS_READS_FILE --output OUT_PREFIX [options...]
required:
--scaff <draft scaffolds> input draft scaffolds.
--reads <TGS reads> input TGS reads.
--output <output prefix> output prefix.
## error correction module
--ne do not execute error correction.
or
--racon <racon> installed racon. Can be installed following https://github.com/isovic/racon
or
--pilon <pilon> pilon jar package. Can be downloaded from https://github.com/broadinstitute/pilon/releases/download/v1.23/pilon-1.23.jar
--java <java> installed java.
--ngs <ngs_reads> input NGS reads used for pilon.
--samtools <samtools> installed samtools.
optional:
--minmap_arg <minmap2 args> like --minmap_arg \' -x ava-ont\'
the arg must be wraped by \' \'
--tgstype <pb/ont> TGS type. ont by default.
--min_idy <float> minimum identity for filtering candidate sequences.
0.3 for ont by default.
0.2 for pb by default.
--min_match <int> minimum matched length for filtering candidate sequences.
300 for ont by default.
200 for pb by default.
--thread <int> number of threads uesd. 16 by default.
--pilon_mem <int> memory used for pilon, passing to -Xmx. can use “m” or “M” for MB, or “g” or “G” for GB. 300G by default.
--chunk <int> split candidates into # of chunks to separately correct errors. 3 by default.
--p_round <int> iteration number for pilon error-correction. 3 by default.
--r_round <int> iteration number for racon error-correction. 1 by default.
--g_check gapsize diff check , none by default.
--min_nread <int> minimum number of reads that can bridge this gap. 1 by default.
--max_nread <int> maximum number of reads that can bridge this gap. -1 by default.
--max_candidate <int> maximum number of candidate alignments used for error correction and gap filling. 10 by default
WARNING: only fasta format TGS reads is supported and fastq format will lead to program crashing !
YOUR-INSTALL-DIR/tgsgapcloser \
--scaff scaffold-path/scaffold.fasta \
--reads tgs-reads-path/tgs.reads.fasta \
--output test_ne \
--ne \
>pipe.log 2>pipe.err
YOUR-INSTALL-DIR/tgsgapcloser \
--scaff scaffold-path/scaffold.fasta \
--reads tgs-reads-path/tgs.reads.fasta \
--output test_racon \
--racon racon-path/bin/racon \
>pipe.log 2>pipe.err
YOUR-INSTALL-DIR/tgsgapcloser \
--scaff scaffold-path/scaffold.fasta \
--reads tgs-reads-path/tgs.reads.fasta \
--output test_pilon \
--pilon pilon-path/pilon-1.23.jar \
--ngs ngs-reads-path/ngs.reads.fastq.gz \
--samtools samtools-path/bin/samtools \
--java java-path/bin/java \
>pipe.log 2>pipe.err
--tgstype
to change it . --tgstype ont
to
--tgstype pb
YOUR-INSTALL-DIR/tgsgapcloser \
--scaff scaffold-path/scaffold.fasta \
--reads tgs-reads-path/tgs.reads.fasta \
--output test_racon \
--racon raconn-path/bin/racon \
--tgstype pb \
>pipe.log 2>pipe.err
Use --minmap_arg ' your-own minimap2 args'
This is useful when your want to avoid a huge paf file.
for example , if your use HiFi Reads , you may try --minmap_arg '-x asm20'
YOUR-INSTALL-DIR/tgsgapcloser \
--scaff scaffold-path/scaffold.fasta \
--reads tgs-reads-path/tgs.reads.fasta \
--output test_racon \
--minmap_arg '-x asm20' \
--racon raconn-path/bin/racon \
--tgstype pb \
>pipe.log 2>pipe.err
>scaffold_1
1 1000 S 1000 2000
1001 1010 N
1011 1100 S 2201 2290
1101 1110 F
1111 1200 S 2301 2390
>scaffold_2
......
If you have any questions, please feel free to ask guolidong@genomics.cn or xumengyang@genomics.cn.