biocore / metagenomics_pooling_notebook

Jupyter notebooks to assist with sample processing
MIT License
8 stars 16 forks source link

make_sample_sheet() missed samplesheet validation errors that load_sample_sheet() catches. #173

Closed RodolfoSalido closed 5 months ago

RodolfoSalido commented 5 months ago

https://github.com/biocore/metagenomics_pooling_notebook/blob/a2f5ba44463f5db87bab5d8a6d0f84c09b44e740/metapool/sample_sheet.py#L1041C16-L1041C16

_validate_sample_sheet_metadata() doesn't catch errors thatvalidate_and_scrub_sample_sheet() does.

charles-cowart commented 5 months ago

I looked this over in detail and I should first clarify for the casual reader that the _validate_sample_sheet_metadata() method is an internal method called by make_sample_sheet() and not by the user themselves. Specifically, this method checks the metadata dictionary input passed by the user in make_sample_sheet(), usually from within a notebook, for completeness. This is so a bad input isn't directly passed to the third-party Sheet() class and raises a cryptic Error to the user. Although the naming is very generalized since we don't want to name functions '..._metadata_dict()' and the like, this is an apples vs oranges comparison.

Technically nothing has changed from previous versions. When we used to use KLSampleSheet(file_path) to open a file and load the contents into a KLSampleSheet() object, we needed to validate that it was correct as a second step by calling validate_and_scrub_sample_sheet(sheet). (Technically this has changed to sheet = load_sample_sheet(file_path) and sheet.validate_and_scrub_sample_sheet() but this is besides the point.)

The same is true of make_sample_sheet(). Technically, the user should have called validate_and_scrub_sample_sheet() afterward to ensure that it's correct and report any Errors if not.

However, the docstring for make_sample_sheet() clearly states that it, 'Write[s] a valid sample sheet'. I think the wetlab is very reasonable in expecting that any returned SampleSheet() object is valid. The simple fix is to simply have make_sample_sheet() call validate_and_scrub_sample_sheet() before returning. I will issue a PR for this.