WGLab / RepeatHMM

a hidden Markov model to infer simple repeats from genome sequences
Other
34 stars 14 forks source link

Running RepeatHMM Scan #37

Closed woodoo46 closed 3 years ago

woodoo46 commented 3 years ago

Hi there,

I would like to run RepeatHMM Scan across whole genome, I suppose I need to create a file like "hg38.predefined.pa"? If so, can you share yours used in your 2020 paper?

Thanks.

George

liuqianhn commented 3 years ago

@woodoo46 You can download a TRF bed file from UCSC genome browser, and then use it as input of RepeatHMM. An example command is below:

nohup python RepeatHMM/bin/repeatHMM.py Scan --SplitAndReAlign 1 --MinSup 3 --UserDefinedUniqID WGSscan --SeqTech "Nanopore" "--Patternfile" trf.bed --cluster 1 --envset repeathmmenv --Onebamfile hx1_bam/hx1_nanopore_all_data_0926.minimap2.sorted.bam --hgfile GRCh38/GRCh38.fa --thread 50 > log/hx1.scan.test.log &

If you do not use cluster setting, please replace --cluster 1 with --cluster 0.

It would be helpful if you can remove those repeat regions in failed regions from your TRF bed file to void some complicated regions ( I will upload the file later).

woodoo46 commented 3 years ago

One more question, does the aligner matter? Can I use ngmlr alignment for the input?

liuqianhn commented 3 years ago

@woodoo46 Your input BAM file can be generated by ngmlr. I do not estimate how aligner affect the results, and the effect should not be significant. You are welcome to share your finding when different aligners are used.