PRIDE-Archive / px-submission-tool

ProteomeXchange data submission tool
3 stars 6 forks source link

Warn or prevent files using special characters in their names #17

Closed Tobias-Ternent closed 4 years ago

Tobias-Ternent commented 9 years ago

Special characters in file names often can cause problems further down the line, e.g. processing the files, listing the files on the website and web service, listing files for FTP/Aspera download. This would not be a problem if special characters were not permitted at the point of submission, and users warned or forced to rename their files which have special characters that should not be permitted.

The only real problem this would pose is for mzIdentML files, that refer to related spectra files. If spectra files are renamed, their references inside the <SpectraData elements would also need to be updated in kind. Explaining this and getting a user to do that is maybe a too complex and asking too much.

Tobias-Ternent commented 8 years ago

I've found that special characters also include apostrophes / single quotes: ' This can cause problems no end with processing scripts.

mbdebian commented 8 years ago

@Tobias-Ternent If these kind of file names are gonna cause trouble down the road, maybe we should warn the user and offer some kind of automatic substitution (name fixing algorithm), or the possibility to cancel the process so the user can fix this on his/her own? What do you think @ypriverol ?

ypriverol commented 8 years ago

To be honest, we should make this as most stringent as possible. I created a list a couple of months ago about characters that don not work in the web, but also I would add as @Tobias-Ternent said others. How to implement that, we can have a short chat.

mbdebian commented 8 years ago

@Tobias-Ternent @ypriverol what about a quick chat about this tomorrow morning?

Tobias-Ternent commented 8 years ago

maybe we should warn the user and offer some kind of automatic substitution (name fixing algorithm), or the possibility to cancel the process so the user can fix this on his/her own?

I would prefer not auto-fix the files because apart from potential mistakes being made on our part, leaving us open to extra work for the future, if the tool were to rename a 'peak' file for an mzid-based 'complete' submission, then it would also need to also rename and fix the peak file name reference inside the mzid file as well. This might be problematic especially with large files.

I would much prefer users to fix it themselves, perhaps guided with a warning linking to documentation about what we mean by how files should be named for a submission, what special characters are, potential regex to fix them, etc.

Sure, let's review this tomorrow.

mbdebian commented 8 years ago

I think this is a good starting point The POSIX portable file name character set, we just need to add white spaces to it. What do you think @Tobias-Ternent and @ypriverol ?

Tobias-Ternent commented 8 years ago

I think that looks good, plus normal space characters ' '.