Shuyib / Phylogenetic-tree-study

Estimating Phylogenetic trees using six microorganisms 16S rRNA gene with Unsupervised Learning, web based tools and Molecular Evolutionary Genetics Analysis MEGA7
https://github.com/Shuyib/Phylogenetic-tree-study/wiki
Creative Commons Zero v1.0 Universal
3 stars 5 forks source link

Review DNA BERT #52

Open Shuyib opened 1 year ago

Shuyib commented 1 year ago

BERT models are encoders which are good for natural language understanding. This is based on the documentation available on HuggingFace NLP course. We need to assess this. You'll need to find out how to move from raw text -> tokenize text -> model -> logits -> Prediction to find motifs which is a capability that has been indicated in their README.

MEME is still a black box and very computationally intensive. But since the source code of BERT is available we can assess it.

Link to DNA BERT

Shuyib commented 1 year ago

🤔 reviewing the tokenizer. Should have start,stop codons and promoter regions should be marked. My assumption.

Shuyib commented 1 year ago

Looking into this now? 🏃🏿