biocore / metagenomics_pooling_notebook

Jupyter notebooks to assist with sample processing
MIT License
8 stars 16 forks source link

Generate Qiita prep seems to be 16S or other #112

Open wasade opened 1 year ago

wasade commented 1 year ago

The column remaining code here (https://github.com/biocore/metagenomics_pooling_notebook/blob/f573aaeee992afb7c570546d6624fd188110d70c/metapool/prep.py#L702) is either 16S or not. Will this silently result in an inaccuracy for ITS and 18S?

By eye, it looks like there is only one thing that is different with the column_renamer dict too which I was puzzled by

wasade commented 1 year ago

...I had initially assumed this method, generate_qiita_prep_file was for both amplicon and metagenomic preps. The code appears specific to amplicon, but the function name doesn't specify amplicon specific, and it's not in an amplicon specific library. That seems unexpected from an naive examination of the code but perhaps it is not surprising

charles-cowart commented 12 months ago

This method does seem to only be used for amplicon. It's not clear to me whether or not the else in the first conditional includes both 18S and ITS or whether this function simply pre-dates one of those. The second conditional explicitly lays them all out.

In the first conditional, the only two differences are '515FB Forward Primer (Parada)' -> primer vs 'Reverse complement of 3prime Illumina Adapter' and 'Forward Primer Linker' -> linker vs 'Reverse Primer Linker'.

@callaband or @mmbryant23, would either of you happen to know whether this 'else' condition covers both 18S and ITS? https://github.com/charles-cowart/metagenomics_pooling_notebook/blob/1fa87c859735289a8922d8906b2289e0993f935e/metapool/prep.py#L730

I'll look to see if there's an analog for metagenomic and see if we can't move them under the same function name. Otherwise I'll proceed to rename it to something amplicon specific. I'll refactor the conditionals for clarity in either case.

callaband commented 12 months ago

I believe that else statement on Line730 is supposed to cover both 18S and ITS based on the way the files are originally named.

charles-cowart commented 12 months ago

I believe that else statement on Line730 is supposed to cover both 18S and ITS based on the way the files are originally named.

ty Celeste! :)