MAGICS-LAB / DNABERT_2

[ICLR 2024] DNABERT-2: Efficient Foundation Model and Benchmark for Multi-Species Genome
Apache License 2.0
254 stars 59 forks source link

About the motif prediction function #94

Closed josecar24 closed 3 months ago

josecar24 commented 3 months ago

Hello Dr. Zhihan, hope you are doing well. I have implemented your amazing pre-trained model for dna sequence tokenization and it works great! Meanwhile, I have noticed you have mentioned the model can predict motifs in [model-and-data] module (some information embedded in the learned representation). May I know how to set up the motif prediction model using DNABert2 when I have DNA sequences from DNA-TF complex as input please? I have gone through the readme page but did not find related protocols. Many thanks!

Best, zitian

Zhihan1996 commented 3 months ago

Hey Zitian,

Sorry for this late reply. We do not have the scripts for motif prediction for DNABERT-2 model. You can generate the embedding of each input token as illustrated in the README and start from that.

josecar24 commented 3 months ago

Thank you for your reply. No worries, I will work on this based on the DNA embedded sequence.

Zhihan1996 commented 3 months ago

Sure. I am happy to help if you meet any problem with our model.