Open ElDeveloper opened 3 years ago
@ElDeveloper Gotcha. The methods we would need to extract the fields we need for Amplicon are fairly analogous to what we already have for metagenomics, right?
Yep, exactly. Biggest difference is perhaps the fact that run_prefix is a single value for all samples instead of what happens in metagenomics where each sample is assigned a run_prefix.
On Sep 29, 2021, at 2:57 PM, Charles Cowart @.***> wrote:
@ElDeveloper Gotcha. The methods we would need to extract the fields we need for Amplicon are fairly analogous to what we already have for metagenomics, right?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
@ElDeveloper Just to confirm, generate_qiita_prep_file() (https://github.com/biocore/metagenomics_pooling_notebook/blob/f9f8438877fea0e8584ab619160c7b3d18e1479a/metapool/prep.py#L445) takes in a plate-map as a parameter, while seqpro expects a run-dir and a sample-sheet as parameters. It seems like I should be able to write an inverse operation to turn a sample-sheet into a prep-file, perhaps with 1-2 additional inputs. Does that sound in-line with what you were thinking?
Not sure about writing an inverse function. I'll leave that up to your judgement. In the end the goal should be to change preparations_for_run
so it can understand when and how to process a 16S run. You can definitely use code from generate_qiita_prep_file
. Does that make sense? @callaband wrote the code in generate_qiita_prep_file
so definitely touch base with her if you have any questions.
The final CLI (from seqpro's point of view) should look basically the same for metagenomic and for amplicon runs.
Currently we generate the mapping files for an amplicon run with the
amplicon-pooling.ipynb
notebook, however this should be changed to work the same was as with metagenomics runs. Namely using the sequencing run folder as an input in combination with the sample sheet.For example, for metagenomics, we run:
In the metagenomics case, output-folder will produce a preparation (mapping) file per project and per lane.
Therefore, we want
mg-scripts
(CC @charles-cowart) to call seqpro for amplicon runs too, such that we would get a file per project. It is important that we generate the mapping (prep) file after the run is completed so we can populate fields such as run_prefix, runid, run_date, instrument model, etc based on the run output. We already have code for how to handle this viametapool/prep.py
but that is currently only for metagenomic/metatranscriptomic data.If we do this, we'll be able to unify the way in which sequence data is processed.