biocore / metagenomics_pooling_notebook

Jupyter notebooks to assist with sample processing
MIT License
8 stars 16 forks source link

Possible removal of BarcodesAreRC column #202

Closed charles-cowart closed 4 months ago

charles-cowart commented 4 months ago

I believe BarcodesAreRC in the Bioinformatics section of sample-sheets is no longer used. It's before my time but there's not much mention of it in the codebase. It's not mentioned in the bcl-convert handbook so I don't believe it's required for converting files to fastq. I'm raising it as an issue before @RodolfoSalido leaves, just in case he is the most knowledgeable person about it.

@mmbryant23 do you know if we still need this column?

mmbryant23 commented 4 months ago

@charles-cowart I use it to confirm whether or not the i2 barcodes are in the forward or reverse orientation when checking sample sheets. This is important because the NovaSeq 6000 should be in the reverse (BarcodesAreRC true) and the NovaSeq X should be in the forward (BarcodesAreRC false). If the samplesheet doesn't reflect the correct orientation, I need to change the barcodes to the correct orientation. @RodolfoSalido - Orientation is still accurately reflected in that column, correct?

RodolfoSalido commented 4 months ago

Unfortunately, as far as I know, the value for that field is currently populated from user input and is error prone. I had mentioned this before but not enough attention was put on it since that field is not used in any automated process from the SPP (correct me if I’m wrong).

I think the best solution going forward would be to populate that field automatically as part of the make_sample_sheet() function, which already checks against a list of REVCOMP_SEQUENCERS to check if BarcodeAreRC.

Sounds like the BarcodesAreRC boolean is mostly for humans to check if the samplesheet has reverse complemented i5 when troubleshooting.

-Rodolfo

On May 8, 2024, at 6:54 AM, mmbryant23 @.***> wrote:

@charles-cowart https://github.com/charles-cowart I use it to confirm whether or not the i2 barcodes are in the forward or reverse orientation when checking sample sheets. This is important because the NovaSeq 6000 should be in the reverse (BarcodesAreRC true) and the NovaSeq X should be in the forward (BarcodesAreRC false). If the samplesheet doesn't reflect the correct orientation, I need to change the barcodes to the correct orientation. @RodolfoSalido https://github.com/RodolfoSalido - Orientation is still accurately reflected in that column, correct?

— Reply to this email directly, view it on GitHub https://github.com/biocore/metagenomics_pooling_notebook/issues/202#issuecomment-2100634202, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEVFI4ADWAYIEH5B7QT6ZLZBIVBDAVCNFSM6AAAAABHMGF7VGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQGYZTIMRQGI. You are receiving this because you were mentioned.

mmbryant23 commented 4 months ago

@RodolfoSalido Agreed, that should be automatic. How is someone currently able to check before processing that i2 is reverse or forward?

RodolfoSalido commented 4 months ago

I don’t think wetlab techs check. The function does the i5 orientation automatically but the BarcodesAreRC field doesn’t get updated accordingly. The BarcodesAreRC booleans gets populated from a user facing Jupyter form with a Bioinformatics dictionary.

-Rodolfo

On Wed, May 8, 2024 at 9:42 AM mmbryant23 @.***> wrote:

@RodolfoSalido https://github.com/RodolfoSalido Agreed, that should be automatic. How is someone currently able to check before processing that i2 is reverse or forward?

— Reply to this email directly, view it on GitHub https://github.com/biocore/metagenomics_pooling_notebook/issues/202#issuecomment-2100984778, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEVFI3RGS7QYQRFGNAHIN3ZBJIYNAVCNFSM6AAAAABHMGF7VGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQHE4DINZXHA . You are receiving this because you were mentioned.Message ID: @.***>

charles-cowart commented 4 months ago

For what it's worth, this is the only place the column gets accessed in Metapool: https://github.com/biocore/metagenomics_pooling_notebook/blob/06012646b9f1b24338700deaa20f3619b08bc906/metapool/sample_sheet.py#L438-L439

REVCOMP_SEQUENCERS is defined here: https://github.com/biocore/metagenomics_pooling_notebook/blob/06012646b9f1b24338700deaa20f3619b08bc906/metapool/metapool.py#L17

charles-cowart commented 4 months ago

Based on what you guys just said, and the code I highlighted, it seems like make_sample_sheet() is already accurately setting that value. Lines 438-439 are part of _add_data_to_sheet() method, which is used by make_sample_sheet(). It will be set to True if the value for sequencer passed to make_sample_sheet() is in the list of REVCOMP_SEQUENCERS and False otherwise.

It seems that the value will be accurate, at least for a sheet made using make_sample_sheet(). If the user alters it afterward that could be an issue but it sounds like most of the time they would be accurate after all?

RodolfoSalido commented 4 months ago

Cool, I think you are right Charlie!

I was under the impression the value was user provided because it is a field in the sample_sheet form, but it is nice to see it is actually automatically populated.

-Rodolfo

On May 8, 2024, at 6:54 AM, mmbryant23 @.***> wrote:

@charles-cowart https://github.com/charles-cowart I use it to confirm whether or not the i2 barcodes are in the forward or reverse orientation when checking sample sheets. This is important because the NovaSeq 6000 should be in the reverse (BarcodesAreRC true) and the NovaSeq X should be in the forward (BarcodesAreRC false). If the samplesheet doesn't reflect the correct orientation, I need to change the barcodes to the correct orientation. @RodolfoSalido https://github.com/RodolfoSalido - Orientation is still accurately reflected in that column, correct?

— Reply to this email directly, view it on GitHub https://github.com/biocore/metagenomics_pooling_notebook/issues/202#issuecomment-2100634202, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADEVFI4ADWAYIEH5B7QT6ZLZBIVBDAVCNFSM6AAAAABHMGF7VGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQGYZTIMRQGI. You are receiving this because you were mentioned.

charles-cowart commented 4 months ago

Thanks Rodolfo and MacKenzie! It sounds like we can close this issue then.