Closed ptth222 closed 3 months ago
I started editing this function because of the "Only one protocol reference should be used in a Protocol REF column." message(s), but I found some other issues to address as well.
Brief answer: -The ISA specifications allow more than on distinct Protocols to be used assuming they have the same protocol_type (e.g. "sample collection"). -The ISA-Tab parser implementation seems to be 'narrower' which explain the rule mentioned above
This function raises some valuable questions.
- Are different protocols allowed in the same Protocol REF column? I think this was pretty much answered as 'Yes' in Different Protocol Names In Study Sequence Cause An Error #501, but it popped up again here.
- Do all protocols in the same Protocol REF column have to have the same type? This function and the structure of the config files suggest so, but is that actually correct?
Here, the protocol_type is checked. there is a bit of flex afforded by the tolerance for synonyms which can be specified in a config file found under `isatools/resources/config/yaml/protocol-types.xml but not in a dynamic way so not ideal.
- Can a cell in a Protocol REF column be blank? I can think of at least one example for this. Let's say you collect 2 different types of tissue from the same source. The first step of collection is the same for both, but one tissue type has an extra step as well. This could result in a file with 2 Protocol REF columns, but the second column would only have the protocol for the extra step of 1 of the tissue types. The type without the extra step would be blank.
Answer:
These use-cases need more scenario but this would cover the 'optional step' (e.g. is a labeling event happening or not)
Some of the changes I made to this function:
- Removed the assumption that the same protocol must be in the Protocol REF column.
- Removed the return value. (Only 2 checks return a value that is actually used, and this isn't one of those.)
- Changed some of the messaging to be a little clearer.
- Added a warning to the 'validator' object. Previously it was only printing to the log.
- Added a way to bypass config checks if config is malformed or incomplete.
Thanks @ptth222 ! @terazus or @proccaserra will run the tests. As discussed last week, we are also working at addressing other issues in the ISA_Tab serializer. The ISA-JSON is easier to deal with due to the better design.
The tests pass now since rebasing to issue-511.
The coverage difference is from turning a warning that was just printed to log to adding it to the returned error/warning dictionary.
I started editing this function because of the "Only one protocol reference should be used in a Protocol REF column." message(s), but I found some other issues to address as well.
This function raises some valuable questions.
Are different protocols allowed in the same Protocol REF column? I think this was pretty much answered as 'Yes' in #501, but it popped up again here.
Do all protocols in the same Protocol REF column have to have the same type? This function and the structure of the config files suggest so, but is that actually correct?
Can a cell in a Protocol REF column be blank? I can think of at least one example for this. Let's say you collect 2 different types of tissue from the same source. The first step of collection is the same for both, but one tissue type has an extra step as well. This could result in a file with 2 Protocol REF columns, but the second column would only have the protocol for the extra step of 1 of the tissue types. The type without the extra step would be blank.
Some of the changes I made to this function: