Open arnodelorme opened 4 years ago
Thanks for the report @arnodelorme, it seems like you're going through a lot of datasets these days :-)
I agree that the validator should catch these cases. A given EEG file such as sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.<ext>
MUST NOT be present more than once through using different extensions <ext>
.
This BIDS dataset contains both .edf and .bdf file (which are very small): https://openneuro.org/datasets/ds002034/versions/1.0.1
sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.edf sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.bdf
I believe it should not have passed the validator since there are 2 types of binary files and the BDF file is obviously corrupted.
I haven't checked whether the BDF file is corrupted, but if it truly is, that raises another, already known, concern: We are not validating the contents of binary EEG files.
This problem is hard to solve, because we would need to implement data format readers in Javascript. So that the bids-validator can go into the files and check for their validity. Currently, this is already being done for NIfTI files (and only for NIfTI files).
I tried many months ago to implement a reader/validator for the BrainVision format using Javascript here: https://github.com/sappelhoff/brainvision-validator/ ... see also bids-standard/legacy-validator#475
However, I ran into problems integrating it with the bids-validator, because it runs both on the browser, and the CLI. --> and the "file access" API for the browser is significantly different and more complicated than accessing files from the CLI (or from programs written in Matlab or Python).
But I will open this post as a separate issue and we certainly should address it as soon as we have some resources available. (And with resources, I mean people who have expertise, energy, and time)
In this issue, let's track our progress to prevent users from storing the same data under different extensions.
This should be some rule that:
sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.<ext>
is presentLIST_OF_ACCEPTED_DATA_FORMAT_EXTENSIONS
sounds difficult but possible to implement.
Yes, this sounds like a good rule.
On Nov 4, 2020, at 10:33 PM, Stefan Appelhoff notifications@github.com wrote:
In this issue, let's track our progress to prevent users from storing the same data under different extensions.
This should be some rule that:
• IF a file sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.
is present • AND is from the list LIST_OF_ACCEPTED_DATA_FORMAT_EXTENSIONS • then there MUST NOT be any other file with the same name and an ext from that list sounds difficult but possible to implement. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
This BIDS dataset contains both .edf and .bdf file (which are very small)
https://openneuro.org/datasets/ds002034/versions/1.0.1
sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.edf sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.bdf
I believe it should not have passed the validator since there are 2 types of binary files and the BDF file is obviously corrupted.