bids-standard / bids-validator

Validator for the Brain Imaging Data Structure
https://bids-standard.github.io/bids-validator/
MIT License
1 stars 4 forks source link

Error for TSV files that are not valid UTF-8 #43

Open effigies opened 2 years ago

effigies commented 2 years ago

According to Tabular files:

TSV files MUST be in UTF-8 encoding.

We currently don't validate this, which leads to situations when data is encoded in, e.g., ISO-8859 (https://github.com/OpenNeuroOrg/openneuro/issues/2515).

This one's a bit of a double-edged sword as it requires reading the entirety of every TSV file, which we've largely avoided to now.

effigies commented 1 week ago

I think we do load every TSV, so this should be doable if it's not already.