Closed charles-cowart closed 2 weeks ago
Yes.
On Wed, Oct 30, 2024 at 5:01 PM Antonio Gonzalez @.***> wrote:
@.**** commented on this pull request.
In metapool/sample_sheet.py https://urldefense.com/v3/__https://github.com/biocore/metagenomics_pooling_notebook/pull/243*discussion_r1823611539__;Iw!!Mih3wA!AqtsuKMgk2sz-1Qa8ic1KRJYGWnXHWscgEmtgFd893cWXYTD29Aknkj9Uj-4Z3A47b2AtCDIRxH1tgeK_RyvmM38$ :
column and raise an Error if not. By convention it should be, once
legacy comments and whitespace rows are removed.
- if df[0][0] != '[Header]':
- raise ValueError("Top section is not [Header]")
Got it, just to confirm: this means that old sample-sheets with comments will still be supported; right?
— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/biocore/metagenomics_pooling_notebook/pull/243*discussion_r1823611539__;Iw!!Mih3wA!AqtsuKMgk2sz-1Qa8ic1KRJYGWnXHWscgEmtgFd893cWXYTD29Aknkj9Uj-4Z3A47b2AtCDIRxH1tgeK_RyvmM38$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKFU7E3ZYQSIDCIICN5JQEDZ6FXLXAVCNFSM6AAAAABP7ZMOC6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDIMBWGY2TSMZZG4__;!!Mih3wA!AqtsuKMgk2sz-1Qa8ic1KRJYGWnXHWscgEmtgFd893cWXYTD29Aknkj9Uj-4Z3A47b2AtCDIRxH1tgeK_X1kjzk9$ . You are receiving this because you authored the thread.Message ID: @.*** com>
Updated load_sample_sheet() to determine the proper SampleSheet() child class to load a file into based on its Assay type, its SheetType, and SheetVersion. If a child class cannot be assigned load_sample_sheet() will continue to raise the generic 'invalid-sample-sheet' message. Otherwise it will return a sheet object and the user will be responsible for running validate_and_scrub() methods and assessing any error messages.
The legacy _parse() method for sample_sheets is perhaps a little cryptic and relies on the csv package. I experimented with implementing a separate parse_header() function using pandas and the read_csv method since the lab is more familiar with it. I believe it works pretty well and it doesn't rely on any legacy functionality in the third party 'sample_sheet' package we appear to be using for _parse(). This might make it easier to move off said package in the future.
the types list inside of _parse_header() seems like it would be better defined at the top of the file; however, python doesn't appear to parse the entire file before evaluating such a definition and hence such a list will be full of undefined classes. Looking for input from reviewers.