bids-standard / bids-validator

Validator for the Brain Imaging Data Structure
https://bids-standard.github.io/bids-validator/
MIT License
1 stars 2 forks source link

Validator MUST NOT accept identical files under different extensions #58

Open arnodelorme opened 4 years ago

arnodelorme commented 4 years ago

This BIDS dataset contains both .edf and .bdf file (which are very small)

https://openneuro.org/datasets/ds002034/versions/1.0.1

sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.edf sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.bdf

I believe it should not have passed the validator since there are 2 types of binary files and the BDF file is obviously corrupted.

sappelhoff commented 4 years ago

Thanks for the report @arnodelorme, it seems like you're going through a lot of datasets these days :-)

I agree that the validator should catch these cases. A given EEG file such as sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.<ext> MUST NOT be present more than once through using different extensions <ext>.

sappelhoff commented 4 years ago

This BIDS dataset contains both .edf and .bdf file (which are very small): https://openneuro.org/datasets/ds002034/versions/1.0.1

sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.edf sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg.bdf

I believe it should not have passed the validator since there are 2 types of binary files and the BDF file is obviously corrupted.

I haven't checked whether the BDF file is corrupted, but if it truly is, that raises another, already known, concern: We are not validating the contents of binary EEG files.

This problem is hard to solve, because we would need to implement data format readers in Javascript. So that the bids-validator can go into the files and check for their validity. Currently, this is already being done for NIfTI files (and only for NIfTI files).

I tried many months ago to implement a reader/validator for the BrainVision format using Javascript here: https://github.com/sappelhoff/brainvision-validator/ ... see also bids-standard/legacy-validator#475

However, I ran into problems integrating it with the bids-validator, because it runs both on the browser, and the CLI. --> and the "file access" API for the browser is significantly different and more complicated than accessing files from the CLI (or from programs written in Matlab or Python).

But I will open this post as a separate issue and we certainly should address it as soon as we have some resources available. (And with resources, I mean people who have expertise, energy, and time)

sappelhoff commented 4 years ago

In this issue, let's track our progress to prevent users from storing the same data under different extensions.

This should be some rule that:

sounds difficult but possible to implement.

arnodelorme commented 4 years ago

Yes, this sounds like a good rule.

On Nov 4, 2020, at 10:33 PM, Stefan Appelhoff notifications@github.com wrote:

In this issue, let's track our progress to prevent users from storing the same data under different extensions.

This should be some rule that:

• IF a file sub-01/ses-01/eeg/sub-01_ses-01_task-offline_run-01_eeg. is present • AND is from the list LIST_OF_ACCEPTED_DATA_FORMAT_EXTENSIONS • then there MUST NOT be any other file with the same name and an ext from that list sounds difficult but possible to implement.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.