bigbio / sdrf-pipelines

A repository to convert SDRF proteomics files into pipelines config files
Apache License 2.0
16 stars 22 forks source link

[DISCUSSION] Permissive or strict parser? #111

Open fabianegli opened 2 years ago

fabianegli commented 2 years ago

There is a good discussion to be had about how permissive or strict a parser for a file standard should be and if it is permissive which errors in the format should be tolerated and which not. To me, the answer to this question for the case of SDRF files is not yet clear and I would welcome a discussion about that from contributors and users of sdrf-pipelines. It follows a list of questions (not comprehensive, at all):

  1. What are permissible errors?
    1. Can a trailing whitespace always be stripped? Or can a trailing whitespace have meaning?
    2. Can an empty line be tolerated? At the beginning? At the end? In the middle?
  2. Can we make valid assumptions about strings? Is the encoding UTF-8? Are file names supposed to be composed of only a limited charset?
    1. Filenames?
    2. Column names?
    3. Fields in the SDRF table?
  3. How thoroughly is the content checked?
    1. Are empty fields allowed? Or filled with some value?
    2. Do we need the same number of value X and Y in a column?
    3. Is invalid content detected? e.g. labelling information in a fraction column?
  4. Which detected issues are how severe?
    1. What do they affect?
    2. Should they be handled silently, trigger warning or raise an error?

Some of these questions have clear answers, others not so much. I would very much welcome a discussion around and about

These questions might also have different answers for different use cases. The SDRF is a tool expected to be applied in a broad range of environments and use cases. Discussing these questions will help us anticipate the requirements better and help in the design and implementation of the next iteration of the sdrf-pipelines package.

Since I am new to this project, such a discussion will also help me get going with contributions. Or in other words, keep me from straying into territories that are better left uncharted.