Open ebrintn opened 9 months ago
SARS-CoV-2 dataset:
continent Asia Europe North America Oceania South America 60 1068 387 69 19
HIV-1 Dataset Los-Alamos Compendium HIV-1 sequences from 2021, all subtypes 198 sequences
These sequences are already aligned by the Los Alamos research centre
Getting Watterson's theta, Lamarck and Pi
Whole human genome sequence databases are hard to get. As a result I moved to BRCA1 genes from homo sapiens
For BRCA1 genome database: downloaded reference sequence - https://www.ncbi.nlm.nih.gov/nuccore/262359905 grabbed 100 similar sequences using blastn (not megablast because too similar)