biocore / metagenomics_pooling_notebook

Jupyter notebooks to assist with sample processing
MIT License
8 stars 16 forks source link

updated load_sample_sheet(). #243

Closed charles-cowart closed 2 weeks ago

charles-cowart commented 1 month ago

Updated load_sample_sheet() to determine the proper SampleSheet() child class to load a file into based on its Assay type, its SheetType, and SheetVersion. If a child class cannot be assigned load_sample_sheet() will continue to raise the generic 'invalid-sample-sheet' message. Otherwise it will return a sheet object and the user will be responsible for running validate_and_scrub() methods and assessing any error messages.

The legacy _parse() method for sample_sheets is perhaps a little cryptic and relies on the csv package. I experimented with implementing a separate parse_header() function using pandas and the read_csv method since the lab is more familiar with it. I believe it works pretty well and it doesn't rely on any legacy functionality in the third party 'sample_sheet' package we appear to be using for _parse(). This might make it easier to move off said package in the future.

the types list inside of _parse_header() seems like it would be better defined at the top of the file; however, python doesn't appear to parse the entire file before evaluating such a definition and hence such a list will be full of undefined classes. Looking for input from reviewers.

charles-cowart commented 2 weeks ago

Yes.

On Wed, Oct 30, 2024 at 5:01 PM Antonio Gonzalez @.***> wrote:

@.**** commented on this pull request.

In metapool/sample_sheet.py https://urldefense.com/v3/__https://github.com/biocore/metagenomics_pooling_notebook/pull/243*discussion_r1823611539__;Iw!!Mih3wA!AqtsuKMgk2sz-1Qa8ic1KRJYGWnXHWscgEmtgFd893cWXYTD29Aknkj9Uj-4Z3A47b2AtCDIRxH1tgeK_RyvmM38$ :

  • column and raise an Error if not. By convention it should be, once

  • legacy comments and whitespace rows are removed.

  • if df[0][0] != '[Header]':
  • raise ValueError("Top section is not [Header]")

Got it, just to confirm: this means that old sample-sheets with comments will still be supported; right?

— Reply to this email directly, view it on GitHub https://urldefense.com/v3/__https://github.com/biocore/metagenomics_pooling_notebook/pull/243*discussion_r1823611539__;Iw!!Mih3wA!AqtsuKMgk2sz-1Qa8ic1KRJYGWnXHWscgEmtgFd893cWXYTD29Aknkj9Uj-4Z3A47b2AtCDIRxH1tgeK_RyvmM38$, or unsubscribe https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AKFU7E3ZYQSIDCIICN5JQEDZ6FXLXAVCNFSM6AAAAABP7ZMOC6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDIMBWGY2TSMZZG4__;!!Mih3wA!AqtsuKMgk2sz-1Qa8ic1KRJYGWnXHWscgEmtgFd893cWXYTD29Aknkj9Uj-4Z3A47b2AtCDIRxH1tgeK_X1kjzk9$ . You are receiving this because you authored the thread.Message ID: @.*** com>