epi2me-labs / wf-transcriptomes

Other
64 stars 30 forks source link

Fix samplesheet requirements #61

Closed phillip-richmond-alamya closed 5 months ago

phillip-richmond-alamya commented 5 months ago

Is your feature related to a problem?

There's no reason that samplesheet should require this chunk of code:

    # check barcodes are correct format
    for barcode in barcodes:
        if not re.match(r'^barcode\d\d+$', barcode):
            sys.stdout.write("values in 'barcode' column are incorrect format")
            sys.exit()

I just want to pass multiple samples, and sure I've set up the bins accordingly, but why require that they be called something specific like "barcode##"???

Describe the solution you'd like

Fix the requirements for sample sheet and synchronize with literally any other sample sheet reading (use nf-core/rnaseq for motivation if you need it).

Describe alternatives you've considered

I reorganized all my data to fit into the definition of sample sheet provided, but then got an error that the format was incorrect for the "barcode" column.

Additional context

No response

cjw85 commented 5 months ago

The format required for sample sheets in our workflows is the same as that required by the MinKNOW sequencing device software. This is deliberately so such that users do not have to make two different sample sheet files, one for the sequencer and one for the analysis software.

phillip-richmond-alamya commented 5 months ago

Okay could you add some language to the README that states this explicitly? Right now the language is loose around directories being matched to the "barcode" column in the sample sheet.