Open evanroyrees opened 3 years ago
I've started a new animations repo at https://github.com/jason-c-kwan/Autometa_animations. Can we make the above list into a checklist?
So far I've made a sort of logo animation that can go at the beginning of each video.
For the first video, I think I will show the BHtSNE graph for the same dataset as we change the length cutoff, while coloring the points based on the ground truth. @WiscEvan can you write instructions here on how I would use the Autometa entrypoints to basically do K-mer counting on all contigs, then do normalization and BHtSNE on different subsets? I am thinking I could programmatically chop up an internal Pandas table, but I just need a quick reminder of how to incorporate the BH-tSNE part into the script.
#!/usr/bin/env python
# Save to subset_and_embed_counts.py
import argparse
import os
import pandas as pd
from autometa.common import kmers
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--input", help="file path to kmer counts table", required=True)
parser.add_argument("--output", help="directory path to store sample size embeddings", required=True)
args = parser.parse_args()
# Read in table
# i.e. counts.tsv
df = pd.read_csv(args.input, sep='\t', index_col='contig')
# subsample by num. contigs specified
sample_sizes = [100, 200, 400, 800, 1000, 5000, 10000]
for sample_size in sample_sizes:
counts_subset = df.sample(n=sample_size)
norm_df = kmers.normalize(counts_subset, method="am_clr")
# Write embedded to sample size path
sample_embed_filepath = os.path.join(args.output, f"kmers.sample_size_{sample_size}.embedded.tsv")
embedded_df = kmers.embed(
kmers=norm_df,
out=sample_embed_filepath,
pca_dimensions=50,
method="bhsne",
embed_dimensions=2
)
print(f"Wrote sample size embedding to {sample_embed_filepath}")
if __name__ == "__main__":
main()
# Set filepaths and parameters
fasta="metagenome.fna"
kmers="counts.tsv"
outdir="path to store embeddings"
size=5
cpus=2
## Compute counts
autometa-kmers --fasta $fasta --kmers $kmers --size $size --cpus $cpus
## Subset and write embeddings
python subset_and_embed_counts.py --input $kmers --output $outdir
I realized that the tasks should probably be made a bit more granular
@WiscEvan Could you check out the script of video 1 and let me know what you think?
I've updated my comment so the checklist can be reviewed when arriving at the page and so that it is in one place
The video 8 idea seems to be a bit redundant with video 2. In order to explain why we need to calculate the coverage, I will have to bring up how we use two BH-tSNE dimensions and one coverage dimension. I'm not sure what else would need to be said?
Yeah, video 8 could probably be grouped in with video 9
Autometa Methods Documentation Video Series
This is a series of videos with voiceovers visualizing describing Autometa methods with visual aides generated by manim animations.
📝:movie_camera: :speaker:🎨 Scripts / Scenes / Animations / Voiceovers 📝 :movie_camera::speaker:🎨
Video Overview
Video 1 - Length filtering
Video 2 - Coverage calculation
Video 3 - ORF calling
Video 4 - Marker annotation
Video 5 - Taxon assignment
Video 6 - K-mer counting
Video 7 - K-mer embedding
Video 8 - 3 dimensions of clustering featuresScriptAnimationsVoiceoverEditingVideo 9 - Binning with recursive DBSCAN
Video 10 - Unclustered recruitment
NOTE: Manim has been forked and is being maintained in two separate repositories.
From the ManimCommunity/manim repository:
Example Scenes from Professor Jason Kwan's ASP presentation