evanroyrees commented 3 years ago

Autometa Methods Documentation Video Series

This is a series of videos with voiceovers visualizing describing Autometa methods with visual aides generated by manim animations.

📝:movie_camera: :speaker:🎨 Scripts / Scenes / Animations / Voiceovers 📝 :movie_camera::speaker:🎨

Video 1 - Length filtering

[x] Script
[ ] Animations
[ ] Voiceover
[ ] Editing

Video 2 - Coverage calculation

[x] Script
[ ] Animations
[ ] Voiceover
[ ] Editing

Video 3 - ORF calling

[x] Script
[ ] Animations
[ ] Voiceover
[ ] Editing

Video 4 - Marker annotation

[ ] Script
[ ] Animations
[ ] Voiceover
[ ] Editing

Video 5 - Taxon assignment

[ ] Script
[ ] Animations
[ ] Voiceover
[ ] Editing

Video 6 - K-mer counting

[ ] Script
[ ] Animations
[ ] Voiceover
[ ] Editing

Video 7 - K-mer embedding

[ ] Script
[ ] Animations
[ ] Voiceover
[ ] Editing

Video 8 - 3 dimensions of clustering features

[ ] ~~Script~~
[ ] ~~Animations~~
[ ] ~~Voiceover~~
[ ] ~~Editing~~

Video 9 - Binning with recursive DBSCAN

[ ] Script
[ ] Animations
[ ] Voiceover
[ ] Editing

Video 10 - Unclustered recruitment

[ ] Script
[ ] Animations
[ ] Voiceover
[ ] Editing

NOTE: Manim has been forked and is being maintained in two separate repositories.

From the ManimCommunity/manim repository:

This fork is updated more frequently than his, and it's recommended to use this fork if you'd like to use Manim for your own projects.

Manim Community website 🔗
Manim Community manim documentation :memo:
Grant Sanderson's manim repository :octocat:

Example Scenes from Professor Jason Kwan's ASP presentation

K-mer counting (5:49 - 6:42)
K-mer embedding (7:02)
3 dimensions of clustering features (8:53 - 9:02)
Binning with recursive DBSCAN (7:18 - 8:46)

jason-c-kwan commented 3 years ago

I've started a new animations repo at https://github.com/jason-c-kwan/Autometa_animations. Can we make the above list into a checklist?

So far I've made a sort of logo animation that can go at the beginning of each video.

jason-c-kwan commented 3 years ago

For the first video, I think I will show the BHtSNE graph for the same dataset as we change the length cutoff, while coloring the points based on the ground truth. @WiscEvan can you write instructions here on how I would use the Autometa entrypoints to basically do K-mer counting on all contigs, then do normalization and BHtSNE on different subsets? I am thinking I could programmatically chop up an internal Pandas table, but I just need a quick reminder of how to incorporate the BH-tSNE part into the script.

evanroyrees commented 3 years ago

k-mer counting

Subset kmers then normalize, embed and write to sample size filepath

#!/usr/bin/env python
# Save to subset_and_embed_counts.py
import argparse
import os
import pandas as pd
from autometa.common import kmers

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("--input", help="file path to kmer counts table", required=True)
    parser.add_argument("--output", help="directory path to store sample size embeddings", required=True)
    args = parser.parse_args()
    # Read in table
    # i.e. counts.tsv
    df = pd.read_csv(args.input, sep='\t', index_col='contig')

    # subsample by num. contigs specified
    sample_sizes = [100, 200, 400, 800, 1000, 5000, 10000]
    for sample_size in sample_sizes:
        counts_subset = df.sample(n=sample_size)
        norm_df = kmers.normalize(counts_subset, method="am_clr")
        # Write embedded to sample size path
        sample_embed_filepath = os.path.join(args.output, f"kmers.sample_size_{sample_size}.embedded.tsv")
        embedded_df = kmers.embed(
            kmers=norm_df,
            out=sample_embed_filepath,
            pca_dimensions=50,
            method="bhsne",
            embed_dimensions=2
        )
        print(f"Wrote sample size embedding to {sample_embed_filepath}")

if __name__ == "__main__":
    main()

Compute counts, subset and write embeddings

# Set filepaths and parameters
fasta="metagenome.fna"
kmers="counts.tsv"
outdir="path to store embeddings"
size=5
cpus=2

## Compute counts
autometa-kmers --fasta $fasta --kmers $kmers --size $size --cpus $cpus

## Subset and write embeddings
python subset_and_embed_counts.py --input $kmers --output $outdir

jason-c-kwan commented 3 years ago

I realized that the tasks should probably be made a bit more granular

jason-c-kwan commented 3 years ago

@WiscEvan Could you check out the script of video 1 and let me know what you think?

evanroyrees commented 3 years ago

I've updated my comment so the checklist can be reviewed when arriving at the page and so that it is in one place

jason-c-kwan commented 3 years ago

The video 8 idea seems to be a bit redundant with video 2. In order to explain why we need to calculate the coverage, I will have to bring up how we use two BH-tSNE dimensions and one coverage dimension. I'm not sure what else would need to be said?

evanroyrees commented 3 years ago

Yeah, video 8 could probably be grouped in with video 9

KwanLab / Autometa

Animations in methods documentation #172

Autometa Methods Documentation Video Series

📝:movie_camera: :speaker:🎨 Scripts / Scenes / Animations / Voiceovers 📝 :movie_camera::speaker:🎨

Video Overview

Video 1 - Length filtering

Video 2 - Coverage calculation

Video 3 - ORF calling

Video 4 - Marker annotation

Video 5 - Taxon assignment

Video 6 - K-mer counting

Video 7 - K-mer embedding

Video 8 - 3 dimensions of clustering features

Video 9 - Binning with recursive DBSCAN

Video 10 - Unclustered recruitment

Example Scenes from Professor Jason Kwan's ASP presentation

k-mer counting

Subset kmers then normalize, embed and write to sample size filepath

Compute counts, subset and write embeddings