feature request: SampleSheet.csv validation

Regardless of LIMS or not, there can always be typos.

it will pay if we devote some time to either incorporate a SampleSheet.csv, or build one ourselves , in order to preemptively detect semantic errors or typos. The pay-off will be having a more automated and seemingly run/demultiplexing

according to @CathrineAB, The common errors found in SampleSheet.csv are:

[ ] Space in sample name or project name. Especially hard to see if they occur at the end of the name. I replace the spaces with a “-“ if in middle of name. I erase the space if it is at the end.
[ ] Æ, Ø or Å in sample name or project names.
[ ] Extra lines in SampleSheet with no sample info in them. Will appear as a bunch of commas for each line which is empty. They need to be deleted or demuxing fails.
[ ] Forget to put ekstra column called “Analysis” and set an “x” in that column for all samples (I don’t know if we will keep this feature for the future)
[ ] '.' in sample names
[ ] my own note: Check for commas == specific number ( ex: There are too many commas between ‘A1’ and ‘RRBS-NMBU’ )
[ ] Check for missing commas: state machine and report if state N is missing comma after transitioning to N+1 state ( ex: a comma was missing between ’Sample1’ and ‘LPRSSBASNMBU1 )

NorwegianVeterinaryInstitute / DemultiplexRawSequenceData

feature request: SampleSheet.csv validation #21