WGLab / DeepRepeat

An accurate repeat detection from Nanopore data using deep learning and image techniques
Other
19 stars 4 forks source link

Seeking guidance on use of custom STR bed files #7

Closed Johnymcb closed 1 year ago

Johnymcb commented 1 year ago

Great job. I'm interested in using DeepRepeat to call genome-wide STRs. I have had a go at your example tutorials they worked well for me. I still have some basic questions that I would like your guidance on:

  1. Can I run DeepRepeat on most custom-defined STR locus with <= 6bp repeat motifs? I have some STRs catalogues of 2-6bp repeat motifs that have been run on other tools, I'm not sure if all of these motifs are included in your training dataset.
  2. Alternatively, is it possible to get a bed file of the STRs that are included in your whole genome training data set or some handy scripts/tutorials for creating a new model?
liuqianhn commented 1 year ago

Hi @Johnymcb thank you for being interested in the tools. Your two questions are the same. We included many well-trained models. For a quick reference I suggest you checking https://github.com/WGLab/DeepRepeat/blob/master/docs/Reproducibility.md, and download well-trained models via commands below.

wget https://www.openbioinformatics.org/hx1/drmodel/trainedmod_32_0.2.tar.gz
tar -xvf trainedmod_32_0.2.tar.gz

From there, you can find whether the motif you are interested in has a trained model. All the trained motifs are also included in trainedmod_32_0.2.

Johnymcb commented 1 year ago

Hi @liuqianhn, thank you.