ISA-tools / isa-api

ISA tools API
https://isa-tools.org
Other
40 stars 37 forks source link

Fairly significant changes to check_protocol_fields #531

Closed ptth222 closed 3 months ago

ptth222 commented 5 months ago

I started editing this function because of the "Only one protocol reference should be used in a Protocol REF column." message(s), but I found some other issues to address as well.

This function raises some valuable questions.

  1. Are different protocols allowed in the same Protocol REF column? I think this was pretty much answered as 'Yes' in #501, but it popped up again here.

  2. Do all protocols in the same Protocol REF column have to have the same type? This function and the structure of the config files suggest so, but is that actually correct?

  3. Can a cell in a Protocol REF column be blank? I can think of at least one example for this. Let's say you collect 2 different types of tissue from the same source. The first step of collection is the same for both, but one tissue type has an extra step as well. This could result in a file with 2 Protocol REF columns, but the second column would only have the protocol for the extra step of 1 of the tissue types. The type without the extra step would be blank.

Some of the changes I made to this function:

  1. Removed the assumption that the same protocol must be in the Protocol REF column.
  2. Removed the return value. (Only 2 checks return a value that is actually used, and this isn't one of those.)
  3. Changed some of the messaging to be a little clearer.
  4. Added a warning to the 'validator' object. Previously it was only printing to the log.
  5. Added a way to bypass config checks if config is malformed or incomplete.
proccaserra commented 5 months ago

I started editing this function because of the "Only one protocol reference should be used in a Protocol REF column." message(s), but I found some other issues to address as well.

Brief answer: -The ISA specifications allow more than on distinct Protocols to be used assuming they have the same protocol_type (e.g. "sample collection"). -The ISA-Tab parser implementation seems to be 'narrower' which explain the rule mentioned above

This function raises some valuable questions.

  1. Are different protocols allowed in the same Protocol REF column? I think this was pretty much answered as 'Yes' in Different Protocol Names In Study Sequence Cause An Error #501, but it popped up again here.
  2. Do all protocols in the same Protocol REF column have to have the same type? This function and the structure of the config files suggest so, but is that actually correct?

Here, the protocol_type is checked. there is a bit of flex afforded by the tolerance for synonyms which can be specified in a config file found under `isatools/resources/config/yaml/protocol-types.xml but not in a dynamic way so not ideal.

  1. Can a cell in a Protocol REF column be blank? I can think of at least one example for this. Let's say you collect 2 different types of tissue from the same source. The first step of collection is the same for both, but one tissue type has an extra step as well. This could result in a file with 2 Protocol REF columns, but the second column would only have the protocol for the extra step of 1 of the tissue types. The type without the extra step would be blank.

Answer:

These use-cases need more scenario but this would cover the 'optional step' (e.g. is a labeling event happening or not)

Some of the changes I made to this function:

  1. Removed the assumption that the same protocol must be in the Protocol REF column.
  2. Removed the return value. (Only 2 checks return a value that is actually used, and this isn't one of those.)
  3. Changed some of the messaging to be a little clearer.
  4. Added a warning to the 'validator' object. Previously it was only printing to the log.
  5. Added a way to bypass config checks if config is malformed or incomplete.

Thanks @ptth222 ! @terazus or @proccaserra will run the tests. As discussed last week, we are also working at addressing other issues in the ISA_Tab serializer. The ISA-JSON is easier to deal with due to the better design.

ptth222 commented 3 months ago

The tests pass now since rebasing to issue-511.

coveralls commented 3 months ago

Coverage Status

coverage: 81.257% (-0.03%) from 81.282% when pulling 733b74ff4384b95fd973365c754e19654fb2e4be on ptth222:check-protocol-fields-update into 16ccc001fbdfaed073a6cb2f63d254c1b0b24a79 on ISA-tools:issue-511.

ptth222 commented 3 months ago

The coverage difference is from turning a warning that was just printed to log to adding it to the returned error/warning dictionary.

new_coverage_40xx original_coverage_40xx