Open VisLab opened 3 weeks ago
@effigies @Remi-Gau We would really appreciate clarification on this in the spec:
1) Should n/a
be allowed in the "onset" columns of events.tsv
files? (I read the current spec as it doesn't, but the description is ambiguous.)
2) If n/a
is allowed how should it be handled by tools?
Should n/a be allowed in the "onset" columns of events.tsv files? (I read the current spec as it doesn't, but the description is ambiguous.)
My read of common principles is that any TSV column can have missing values indicated with n/a.
If n/a is allowed how should it be handled by tools?
For the validator, we verify that the existing values are sorted, ignoring n/a. For design matrix generation, I would just drop the rows on ingestion, since they are not usable without an onset and duration.
Tbh, I don't know what the use case for these is. I'd be okay saying that tools SHOULD drop rows with these onsets, and then somebody who has something to do with them can neglect that.
I don't think we give much interpretation guidance to tools, so I'm not sure exactly how or where to say that.
My concern is not tsv columns in general but the "onset" column in particular:
https://bids-specification.readthedocs.io/en/stable/glossary.html#onset-columns. This description mentions positive and negative values very specifically and what the behavior should be.
Let me give a little more background on this issue. In the past, the HED validators have assumed that the onset
columns do not allow "n/a". However, @yarikoptic has a dataset in which he wants "n/a" in the onset
column to be interpreted as something that happened at some indeterminate time during the run. This is not unreasonable. I did an preliminary implementation in the Python HED validator that handles this as follows:
onset
column.Our downstream HED tools expect that the "n/a" onset rows are filtered out before they get them. Before we go forward with this and eventually get it into the HED JavaScript validator, we would appreciate a definitive statement in the specification about whether "n/a" is allowed and how it should be interpreted. For example in the Task Events chapter the discussion of duration
does this, but the discussion of onset
does not.
I agree with @effigies description of what I would expect.
@VisLab should we just add Must always be either zero or positive (or n/a if unavailable).
to the event description in the task section (same sentence as for duration) to make it explicit?
It can be negative too.
Just add:
A
n/a
value indicates the onset time is unknown or unavailable.
After
Onset (in seconds) of the event, measured from the beginning of the acquisition of the first data point stored in the corresponding task data file. Negative onsets are allowed, to account for events that occur prior to the first stored data point.
The current description of
onset
in Task events says that the onset column may have 0, positive and negative values. It does not explicitly say it does not allow "n/a", although I believe that somewhere in the past that was in the spec. The current description ofduration
explicitly says that it can be "n/a" and what that means.My question is --- is "n/a" really allowed in the
onset
column. If so, what does it mean and how are tools required to handle it?This should be made explicit in the specification and in the schema.
This issue is related to issue #1938 and to issue hed-standard/hed-python#1026.