Open evanroyrees opened 4 years ago
In out last meeting we decided to work off of dev
since it is so far ahead of the main
branch.
I've created a branch nsf-2
for this project:
https://github.com/KwanLab/Autometa/tree/nsf-2
We also decided that the end goal would be an implementation of this Aim as its own Nextflow module (with process logic contained in Autometa Python endpoints) that also interfaces within the larger Autometa Nextflow pipeline. Simple external software like all-v-all Diamond BLAST will be wrapped in Nextflow only.
I think maybe we should keep this branch off of KwanLab and have it on our own forked repos. This way we do not confuse any end-users. Otherwise we can push this branch as a PR to the KwanLab when ready. We've been using the article by Vincent Driessen for reference.
Upon revisiting, I think nsf-2
is appropriate, I'm just worried about confusing the end-users. Although maybe this is best?
As a note- there has been some offline discussion about this. https://github.com/KwanLab/Autometa/issues/13#issuecomment-800423310
I would say work off KwanLab/Autometa@nsf-2
. I'm not sure why a non-"main" branch would confuse end-users, especially since it follows the paradigm in the image?
Notes from initial pseudo-code session: First test data -> MIX51-EQUAL
{nextflow pseudo-code}
Channel -> concatenate all orf fastas
Keep track of which orf belongs to which contig and metagenome
Process create_blast_database {
Input:
all orfs from every sample
}
Process all-v-all-blast {
Input:
Orf to contig to metagenome database/table/dictionary
all orfs from every sample
Blast database
Output:
Filtered BLAST table
Filter self-hit (is this a setting in diamond?)
(should we limit to only results from different metagenome samples?)
}
Process identify_gene_homologs{
Input:
BLAST table
Output:
Clusters of orfs
Intrasample hits -> orf to orf within the same sample
Intersample hits -> sample x orf to hit to sample y orf
}
Process calculating_orf_coverage{
Input:
Filtered clusters of orfs
Contigs containing those orfs
Reads
Output:
Read alignments to orfs/contigs + counts
}
Process cluster_based_on_coverage {
Input:
Read alignments to orfs/contigs + counts
Val metagenome_depth (for normalizing coverage)
Output:
Script: “ step 2 in aim 2”
}
@WiscEvan Is there a good way to get ORF-level coverage? If so are we going to if/else whether input is x/y; or require input file(s) = type x?
Answered my own question #1, but we will have to decide on the second question
Tasks
Expectations and Approach
Evaluation Datasets: