Input NGS data. - Githubissues

123chenshixin commented 2 years ago

Hi, I am using Illumina paired-end data and nanopore data to fill my scaffold assemble's gaps throught TGS-GapCloser. I have two Illumina files whose name is R1_fq.gz and R2_fq.gz respectively. However, it seems that the programme just accept one Illumina files(--ngs option). Is it right to merge the two files into one? Also, which do you think is the best choice to fill gaps, just using nanopore data or using both Illumina data and nanopore data? I would sincerely appreciate it if you could give me some advice.

123chenshixin commented 2 years ago

Also, both my Illumina data(~97x) and my nanopore data(~174x) are with high depth, but in the paper your data is with low depth. Is there any problem with it?

adonis316 commented 2 years ago

Hi,

TGS-GapCloser only uses long reads to fill the gaps. But you can either use additional short reads or long reads themselves to correct the long-read sequencing errors to improve the accuracy. The performance of error correction is dependent on the algorithm as well as dataset quality (e.g. read sequencing depth, raw sequencing error rate).
TGS-GapCloser only accepts one short-read files. Please merge the two fastq files into one if you would like to use them. In addition, Illumina data (~97x) would not effect the performance much.
It is recommended to use accurate short reads to correct long reads. But racon mode can also give a good result if the long-read sequencing depth is sufficiently high (>20X).
The much high coverage depth of Nanopore long reads (~174x) would improve the accuracy but drastically increase the memory usage and CPU time, especially for a complex genome (high heterozygosity, repetitive…). Please consider to reduce the depth (~20x) and literately close the gaps or use stricter parameters as mentioned #43 .

Thanks, Mengyang

BGI-Qingdao / TGS-GapCloser

Input NGS data. #46