bigbio / sdrf-pipelines

A repository to convert SDRF proteomics files into pipelines config files
Apache License 2.0
16 stars 22 forks source link

Allow missing/NA-filled columns for sdrf2openms conversion #173

Open jpfeuffer opened 2 months ago

jpfeuffer commented 2 months ago

In quantms, for simple experiments, I would like to allow using SDRF only to specify the experimental design (incl. channels, replicates, fractions), not mass spec. related settings (and use nextflow parameters to specify the rest). This is mainly because it is much easier to quickly set enzyme, modifications, tolerance etc. in the nextflow WebUI instead of editing a tsv file with ontology names etc.

For this, the first step would be to allow those three columns to be missing or the entries being "Not Available" in the "sdrf to openms" conversion tool, because we use the config file that comes out of this to fill the meta information channel in nextflow.

As soon as that works, I could change the create_input_channel module to check for missingness in the openms.tsv and use the nextflow params as a fallback.

As discussed with @ypriverol

jpfeuffer commented 3 weeks ago

I guess what I am asking is to allow a "validator" for certain groups of columns that interprets them as optional. I.e. if the instrument type column is present, validate it, otherwise it is fine.

I am happy to use my own validators in our downstream code if you can give me some hints on how to do that.

I also would like to change the composition of the column groups: e.g., I believe that things like technical replicate and fraction identifier better belong to the "experimental design" group of validators.

jpfeuffer commented 3 weeks ago

Alternatively I could put NOT AVAILABLE in all rows of those columns, but I think the openms-convert functionality does not correctly handle columns with NOT AVAILABLE, since it will try to fill it with some defaults: https://github.com/bigbio/sdrf-pipelines/blob/main/sdrf_pipelines/openms/openms.py#L307

It will only work with missing columns, as far as I can see.

IF we somehow ultimately could pass through the missingness until the openms_config.tsv file, we could then check here: https://github.com/bigbio/quantms/blob/dev/subworkflows/local/create_input_channel.nf#L114 if those columns exist, and if not, warn and fall back to the nextflow params.

jpfeuffer commented 3 weeks ago

@ypriverol @daichengxin Any thoughts on this?