WGLab / NanoRepeat

NanoRepeat: fast and accurate analysis of Short Tandem Repeats (STRs) from Oxford Nanopore sequencing data
MIT License
17 stars 1 forks source link

General questions #16

Closed HLHsieh closed 4 months ago

HLHsieh commented 4 months ago

Hi Li,

I have some general questions about the usage and output. I would appreciate it if you could clarify them.

1) Difference between using FASTQ or BAM files: Are there any significant differences when using FASTQ or BAM files for quantification? I have a set of BAM files aligned by minimap2 with the -Y (soft clipping) option, but I noticed your script did not include this parameter for FASTQ input. Do you have any thoughts or suggestions on this matter?

2) About the output:

Thank you for your assistance.

Best regards, Hsin

fangli80 commented 4 months ago

Hello Hsin, There is no much difference between FASTQ and BAM input. NanoRepeat will realign the reads during analysis. It does not use supplementary alignments so the -Y option does not affect the result.

NanoRepeat also works for multiple repeat regions. The output.tsv include repeat quantification of all specified regions. The summary.txt file is for a specific repeat. If you have multiple repeat regions, there will be one summary.txt file for each repeat.

The repeat_size.txt is the raw result for each read.

Yes. When generating the summary report, some reads with low phasing confidence are removed.

Li

HLHsieh commented 4 months ago

Hi Li,

Thanks for your clear clarification!

Best, Hsin