IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
878 stars 487 forks source link

list of disallowed special characters for filenames is not accurate #8926

Open matthew-a-dunlap opened 2 years ago

matthew-a-dunlap commented 2 years ago

Working on CORE2 which uploads to dataverse, I'm writing code to prevent certain characters in filenames.

Testing Dataverse's restrictions, it reports File Name cannot contain any of the following characters: / : * ? " < > | ; # .

This info is incorrect for at least two reasons:

  1. This filename still errors even though none of those characters are present: !$%&’()+,-=@[\]^_{}~. It seems this is because the \ character is prevented as well.
  2. the . character is not actually prevented by Dataverse as far as I can tell, even if its used multiple times.

It would be useful for the user to have a more accurate error message. Also would be useful if this was in the documentation (I may have missed it). This happens on 5.3 and 5.10.1

shlake commented 1 year ago

Working on this, but have found another problem. See this #9080

As noted above \ isn't "allowed" in a filename, but the Dataverse software changes the filename with \ , but it doesn't generate an error. This filename citations\files.txt is changed to files.txt without an error.

poikilotherm commented 1 year ago

Related:

shlake commented 1 year ago

@matthew-a-dunlap I'm checking on non-valid characters, but I have figured out that the . in the error list, is just the end of the sentence . it does not mean that "period" is an invalid character, but it does need to be removed to avoid confusion.

The error message is coming from this file: WEB-INF/classes/ValidationMessages.properties

ErykKul commented 1 year ago

Patter for the "label" in the code: https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/FileMetadata.java#L72 regexp="^[^:<>;#/\"\\*\\|\\?\\\\]*$"

ErykKul commented 1 year ago

Directory name validator: https://github.com/IQSS/dataverse/blob/develop/src/main/java/edu/harvard/iq/dataverse/FileDirectoryNameValidator.java#L32

String validCharacters = "[\\w\\\\/. -]+";