IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
882 stars 494 forks source link

Feature Request/Idea: Review of used regex #10607

Open jp-tosca opened 5 months ago

jp-tosca commented 5 months ago

Overview of the Feature Request

While working on the OpenAPI document fix on #10328 we did some validations using https://quobix.com/vacuum/ where we noticed that we were using some regular expressions that are not compatible with ECMA-262 "^[^:<>;#/\"\\*\\|\\?\\\\]*$" and raise an error.

Our endpoints work with JSON and JavaScript frontends like React so it would make sense to review and abide by the standard that regex uses for ECMA-262, https://json-schema.org/understanding-json-schema/reference/regular_expressions

Also as mentioned by @pdurbin there are some issues raised by Datalad about name conflicts on some languages: https://docs.datalad.org/projects/dataverse/en/latest/settingup.html#dataverse-limitations

"Dataverse will not accept names like Änderungen or Déchiffrer, due to the Ä and é in them."

pdurbin commented 5 months ago

Related:

Also, the regex that is giving us the most trouble is the one for the "label" (filename) for the FileMetadata entity. That's the gnarly one JP mentioned above. Here it is in context:

https://github.com/IQSS/dataverse/blob/v6.2/src/main/java/edu/harvard/iq/dataverse/FileMetadata.java#L72

I'll leave a comment on this issue from DataLad to at least let them know we hear them: