Question about the corrected reads

AntonBankevich / LJA

Other

108 stars 16 forks source link

Question about the corrected reads #18

Closed HaploKit closed 2 years ago

HaploKit commented 2 years ago

Hi, thank you for developing this great tool! I want to see how the accuracy of the error-corrected reads is. I used this commandlja -o out --reads reads.fa and tested on a small simulated HiFi dataset. I guess these two files k501/corrected.fasta and k5001/corrected_reads.fasta are the corrected reads ? However, after evaluation, the error rates of corrected reads in both files are much higher than raw reads, which looks strange. Is there something I made a mistake? Any help would be greatly appreciated!

AntonBankevich commented 2 years ago

Hi! Sorry for the late reply and thank you for your interest in LJA. Unfortunately read correction is performed in homopolymer space where all homopolymers are compressed into a single letter. As the result trying to measure the error rate could even result in many unaligned reads.

HaploKit commented 2 years ago

Thanks a lot, now I see. Could it be ok to also output the "normal" (not homopolymer ) corrected reads ? Or is it possible to recover the normal corrected reads according to the current intermediate files? As it says "LJA reduces the error rate in HiFi reads by three orders of magnitude", I want to see if LJA can perform much better error correction in my own cases. It would be great if you provide this additional option. Thanks in advance.

AntonBankevich commented 2 years ago

Unfortunately it would be technically difficult to output uncompressed polished reads. This phrase referred to homopolymer space and I admit that it is unclear from the context. I will try to add this feature to the list but I am not sure if we will be able to implement it for the next few months.

HaploKit commented 2 years ago

OK，thanks anyway.