Open yarikoptic opened 3 years ago
bids-standard/bids-specification#858 has been merged, so the corresponding check in the validator could/should be worked on. Suggestions on where to start would be very welcome @rwblair :-)
I think we need to think through the logic of the check before we worry about where to implement.
Here's a proposal that only cares about collisions within directories:
def check_dir(dirname):
files = listdir(dirname)
if len(set(fname.lower() for fname in files)) != len(files):
# report collision
for fname in files:
if isdir(fname):
check_dir(fname)
It should catch colliding subjects, but not something like sub-s01/anat/sub-S01_...
, and it won't care about if one subject uses mixed cases for its acquisition parameters and another uses lowercase. Do we care about these? They might be caught by other checks, in any case.
Here's another edge case: sub-01/anat/sub-01_acq-HiRes_T1w.nii.gz
and sub-01/func/sub-01_task-rest_acq-hires_bold.nii.gz
. I think this this should not be flagged, as it's not a collision, but if we implemented something like a global collision test for each entity, it would be. And others might think it should be.
Since the issue is relatively rare, I think it might suffice to just operate on a full list of files/directories in the dataset
def check_paths(paths):
lowered = defaultdict(list)
for path in paths:
path = path.rstrip('/') # just to harnomize among possible directories with/without trailing /?
lowered[path.lower()].append(path)
for lower, origs in lowered.items():
if len(origs) > 1:
# report/return collision for `origs` -> `lower`
note: paths better not to be "dereferenced" (so git-annex'ed symlinks remain symlinks)
Regardless of if the validator is run via browser or node we create a dictionary with all the files as keys in the fileArrayToObject
function in utils/files/readDir.js
. Filenames could be normalized here and we could check for collisions, but we currently do not create/return issues at this stage of validation.
There is precedent for some validation happening in the ./utils/files/
directory with /utils/files/validateMisc.js
. So one option would be adding a new file to do this validation into ./utils/file/
and then calling the new validator function in the ./validators/bids/fullTest.js
similar to how validateMisc
does.
Would either of these checks catch @mattcieslak's motivating use case?
Regardless of if the validator is run via browser or node we create a dictionary with all the files as keys in the
fileArrayToObject
function inutils/files/readDir.js
. Filenames could be normalized here and we could check for collisions, but we currently do not create/return issues at this stage of validation.There is precedent for some validation happening in the .
/utils/files/
directory with/utils/files/validateMisc.js
. So one option would be adding a new file to do this validation into./utils/file/
and then calling the new validator function in the./validators/bids/fullTest.js
similar to howvalidateMisc
does.
This check should consider ignored files, so it will need to be near the start of fullTest before they are removed.
In our case there were two subjects that had the same ID but with different capitalization (sub-ndar, sub-NDAR) in the root bids directory. All the files within their directories matched the case of their subject directory name
This check should consider ignored files, so it will need to be near the start of fullTest before they are removed.
"not sure" it should anyhow consult ignored files -- the problem from the collisions happens at the filesystem level, so even if e.g. sub-NDAR
is ignored while sub-ndar
not -- collision while downloading/datalad clone
etc that dataset would result in trouble.
original proposal as a PR in BIDS: https://github.com/bids-standard/bids-specification/pull/858 seems to be getting approved and if merged soon will enter a release after v1.6.0-203-g811e17e2 ,