bihealth / cubi-isa-templates

CUBI ISA-Tab templates
MIT License
2 stars 4 forks source link

fix: make variable naming consistent with ISA standard #13

Open sellth opened 1 year ago

sellth commented 1 year ago

fixes: bihealth/cubi-tk#106

Nicolai-vKuegelgen commented 1 year ago

Another thing (I almost forgot):

This only seems to the address the variable names, however many templates also use Sample Name as the first column in the s-file instead of Source Name, so maybe this should/needs to be changed as well? Otherwise the variable renaming makes little sense. If we do change this I'm not sure what the effects on pipelines that might depend in these templates will be.

sellth commented 1 year ago

Thanks Nicolai for looking into this.

  1. for one thing, the stem cell core templates actually ask for sample names, since we use the cellline names as source

That was indeed an oversight in the stem_cell_core_sc template which is fixed now. I left _bulk unchanged because of this.

  1. From what I've seen so far many templates use the same name for sample & source. […] For most experimental people (or just people not accustomed to ISA) sample name is much more intuitive description than source name in these cases.

Most templates derive their Source Names from the Sample Names, but I would agree with Mikko that this is a bit confusing in the context of ISA-tabs and also experimentally. I would expect Sample Names to be derived from the Source Names plus a suffix (optionally). That is how I defined it in for the MC template, there is source_names and sample_suffix in the cookiecutter.json.

This only seems to the address the variable names, however many templates also use Sample Name as the first column in the s-file instead of Source Name, so maybe this should/needs to be changed as well? Otherwise the variable renaming makes little sense.

Not sure what you mean by this. s_ files need to start with a Source Name column to be standard compliant and all do so right now.

Nicolai-vKuegelgen commented 1 year ago

Most templates derive their Source Names from the Sample Names, but I would agree with Mikko that this is a bit confusing in the context of ISA-tabs and also experimentally. I would expect Sample Names to be derived from the Source Names plus a suffix (optionally). That is how I defined it in for the MC template, there is source_names and sample_suffix in the cookiecutter.json.

The templates might do indeed do this, but I would argue that most users generally do not, since they only come up with source names when they start entering things into sodar (they will always have some sort of sample name ready). Maybe the more important questions to answer for is: who will use these templates or rather who do we want to use them? For larger projects (with inevitably closer cubi collaboration), someone will probably figure out a good way to organise and derive sample and source names. But smaller projects that - maybe one day? - some can just create & fill the samplehseet from within sodar this is not the case, and these people likely will come with a list of samples and names, but not source names.

Not sure what you mean by this. s_ files need to start with a Source Name column to be standard compliant and all do so right now.

Ah you're right I must have confused some (older?) things here or maybe I just remebered the start of some a-files ...

sellth commented 1 year ago

As this is not really urgent, let's not do anything hastily and talk once I'm back in Berlin.