BGI-Qingdao / TGS-GapCloser

A gap-closing software tool that uses long reads to enhance genome assembly.
GNU General Public License v3.0
179 stars 13 forks source link

Can I use both long reads and NGS reads with error correction? #41

Closed CGsTree closed 2 years ago

CGsTree commented 2 years ago

Will this lead to better results? If so, use both -racon and -pilon in arguments?

adonis316 commented 2 years ago
  1. The pilon mode was originally designed for the error correction using NGS short reads, while racon for the correction using TGS long read themselves if no short reads are available.
  2. The performance of error correction is dependent on the algorithm as well as dataset quality (e.g. read sequencing depth, raw sequencing error rate).
  3. It is recommended to use accurate short reads to correct long reads by pilon before gap closure if possible. But racon mode can also give a good result if the long read sequencing depth is sufficiently high (>20X).
  4. Currently, the pipeline only supports one single correction mode for each run. Correction using only long reads or short reads is enough to guarantee the accuracy of gap closure.
  5. However, you can use either long reads or NGS reads to correct sequencing errors before gap closure, and then polish the final assembly with the other dataset after gap closure.

Thanks, Mengyang Xu

CGsTree commented 2 years ago
  1. The pilon mode was originally designed for the error correction using NGS short reads, while racon for the correction using TGS long read themselves if no short reads are available.
  2. The performance of error correction is dependent on the algorithm as well as dataset quality (e.g. read sequencing depth, raw sequencing error rate).
  3. It is recommended to use accurate short reads to correct long reads by pilon before gap closure if possible. But racon mode can also give a good result if the long read sequencing depth is sufficiently high (>20X).
  4. Currently, the pipeline only supports one single correction mode for each run. Correction using only long reads or short reads is enough to guarantee the accuracy of gap closure.
  5. However, you can use either long reads or NGS reads to correct sequencing errors before gap closure, and then polish the final assembly with the other dataset after gap closure.

Thanks, Mengyang Xu

Thank you for your detailed explanation.

From what I understand, the best strategy is probably continuous use:

  1. TGS-GapCloser with NGS short reads in plion mode;
  2. TGS-GapCloser with TGS long read in racon mode;
  3. polish with NGS short reads and long reads;
  4. And then I might have a genome with publication standard. Do I understand this correctly?

In addition, I have polished with NGS short datas and TGS long reads immediately after genome assembly, so do I still need to polish after gapclose?

I am a novice and would appreciate any guidance.šŸ™

CGsTree commented 2 years ago
  1. The pilon mode was originally designed for the error correction using NGS short reads, while racon for the correction using TGS long read themselves if no short reads are available.
  2. The performance of error correction is dependent on the algorithm as well as dataset quality (e.g. read sequencing depth, raw sequencing error rate).
  3. It is recommended to use accurate short reads to correct long reads by pilon before gap closure if possible. But racon mode can also give a good result if the long read sequencing depth is sufficiently high (>20X).
  4. Currently, the pipeline only supports one single correction mode for each run. Correction using only long reads or short reads is enough to guarantee the accuracy of gap closure.
  5. However, you can use either long reads or NGS reads to correct sequencing errors before gap closure, and then polish the final assembly with the other dataset after gap closure.

Thanks, Mengyang Xu

Since TGS-gapcloser calls pilon and racon, is it possible to replace the polish tool before or after it?

adonis316 commented 2 years ago

It is unnecessary to do the gap closure twice with the same long-read dataset.

In your case, I would recommend this pipeline:

  1. TGS-GapCloser with NGS short reads in pilon mode;
  2. Polish with long reads if sequencing depth >20X;

Note that iteratively close gaps with the same long-read dataset by TGS-GapCloser could give a little bit better result sometimes. But no more than three times.

Multiple error corrections or assembly polishing could introduce a better assembly accuracy at the sing-base level. But you might lose some heterozygous information. You could determine whether you need another polishing by observing some quantitative metrics such as BUSCO score, RNA mapping rateā€¦

TGS-GapCloser only supports pilon or racon. But you can correct all the long reads by another polisher/corrector before or after it and cancel its own correction in the pipeline by ā€œ--neā€. The point is that TGS-GapCloser will only correct long reads that mapped to the gap regions in the assembly to accelerate the whole procedure and save memory.

CGsTree commented 2 years ago

It is unnecessary to do the gap closure twice with the same long-read dataset.

In your case, I would recommend this pipeline:

  1. TGS-GapCloser with NGS short reads in pilon mode;
  2. Polish with long reads if sequencing depth >20X;

Note that iteratively close gaps with the same long-read dataset by TGS-GapCloser could give a little bit better result sometimes. But no more than three times.

Multiple error corrections or assembly polishing could introduce a better assembly accuracy at the sing-base level. But you might lose some heterozygous information. You could determine whether you need another polishing by observing some quantitative metrics such as BUSCO score, RNA mapping rateā€¦

TGS-GapCloser only supports pilon or racon. But you can correct all the long reads by another polisher/corrector before or after it and cancel its own correction in the pipeline by ā€œ--neā€. The point is that TGS-GapCloser will only correct long reads that mapped to the gap regions in the assembly to accelerate the whole procedure and save memory.

Thank you very much for your timely and detailed guidance!