Closed anjackson closed 10 years ago
Actually, this seems to work out okay. With a tiny bit of transformation the different registries can all be matched to a limited glob syntax (only ? and * are special, but not all characters allowed). A few minor tweaks to make the validation regex more permissive and all seems well.
The warnings from some of the sources are misleading, because they normalise the extensions before validating them, and not every source uses a compatible syntax.
e.g. FFW has used "command-line shell" syntax w. ? meaning one random char, * arbitrary sequence, ! $ just literals.
Whereas Tika uses ^ and $ as start and end markers.
So, better to perform the validation in the per-source code, and perform less stringent validation on the normalised form.