OpenRefine / CommonsExtension

An OpenRefine extension that helps with Wikimedia Commons editing: start projects from Wikimedia Commons categories; Commons-specific GREL functions.
BSD 3-Clause "New" or "Revised" License
14 stars 10 forks source link

Error when uploading not sent back to the user in the front-end #103

Open antoine2711 opened 6 months ago

antoine2711 commented 6 months ago

I have this error when uploading:

15:25:50.202 [..ting.EditBatchProcessor] Requesting documents (1ms)
15:25:50.296 [..ting.EditBatchProcessor] IO error while editing: /…/Transferts (AQM)/2024-05 (Is a directory) (94ms)
15:25:50.422 [..ting.EditBatchProcessor] IO error while editing: /…/Transferts (AQM)/2024-05 (Is a directory) (126ms)

But I get nothing in the front-end.

Regards, Antoine

antoine2711 commented 6 months ago

Also, I have this error message that is not relayed to the front-end, but even if it was, I don't understand what’s the problem with the file name…

16:32:31.558 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_Carton_d'invitation_pour_«_Bastien_et_Bastienne_».jpg"} (69690ms)
16:32:42.236 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_Marionnettes_de_«_La_Boîte_à_joujoux_»_dans_les_décors.jpg"} (10678ms)
16:32:53.609 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_Programme_de_«_Bastien_et_Bastienne_»_de_Jacques_Chesnais.jpg"} (11373ms)
16:33:00.627 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_Programme_de_«_Les_Comédiens_de_bois_»_de_Jacques_Chesnais,_argument_de_la_pièce.jpg"} (7018ms)
16:33:03.326 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_Programme_de_«_Les_Comédiens_de_bois_»_de_Jacques_Chesnais,_distribution_des_rôles.jpg"} (2699ms)
16:33:04.473 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_Manipulateurs_de_«_La_Boîte_à_joujoux_».jpg"} (1147ms)
16:33:19.697 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_Affiche_de_«_Tintin_et_le_Temple_du_Soleil_»_en_anglais.jpg"} (15224ms)
16:33:41.356 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_«_Tintin_et_le_Temple_du_Soleil_»_de_Micheline_Legendre,_pour_Marionnettes_en_vitrine.jpg"} (21659ms)
16:33:57.972 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_«_Tintin_et_le_Temple_du_Soleil_»_de_Micheline_Legendre_pour_Marionnettes_en_vitrine_(2).jpg"} (16616ms)
16:34:09.691 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_Micheline_Legendre_avec_la_rose_du_«_Petit_Prince_».png"} (11719ms)
16:34:13.316 [..ting.EditBatchProcessor] Requesting documents (3625ms)
16:34:14.478 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_Le_Théâtre_«_Tintin_»_au_Parc_Lafontaine.jpg"} (1162ms)
16:34:15.432 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_Vitrine_de_Noël_de_«_Tintin_au_Tibet_»_1964.jpg"} (954ms)
16:34:19.638 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_«_Tintin_au_Tibet_»_au_Jardin_des_merveilles.jpg"} (4206ms)
16:34:30.142 [..ting.EditBatchProcessor] MediaWiki error while editing [Warning]: The file upload action returned the 'Warning' error code. Warnings are: {badfilename="ML_«_Hansel_et_Gretel_»_à_Stratford.jpg"} (10504ms)

That being written, intuitively, Wikimedia Commons doesn't seems to like chevrons (« & ») in the name… ;-)

Regards, Antoine

antoine2711 commented 6 months ago

So, for the second problem, I figured out. It was the use of non-breakable spaces that are often used in French. I think OR should warn the user about that…

Regards, Antoine

lokal-profil commented 5 months ago

Commons disallows non-printing characters in the filename. IIRC the validation of this is all handled in FileNameScrutinizer which reflects the default values of wgLegalTitleChars. [but it's not a very transparent regexp]

That doesn't undermine the problem raised here of surfacing the errors in the frontend =)

wetneb commented 5 months ago

The problem of surfacing the errors in the frontend should be addressed by https://github.com/OpenRefine/OpenRefine/pull/6555, although I did not test it specifically for media files upload. I wonder if @antoine2711 or @Vesihiisi would be interested in trying it out?

wetneb commented 5 months ago

@lokal-profil @antoine2711 I have tested my PR https://github.com/OpenRefine/OpenRefine/pull/6555 with Commons upload and made some tweaks to improve the UX there.

See the screenshots there. Any feedback welcome.

antoine2711 commented 5 months ago

@lokal-profil @antoine2711 I have tested my PR OpenRefine/OpenRefine#6555 with Commons upload and made some tweaks to improve the UX there.

I’m waiting for the next version that can load the Commons extension.

Regards, Antoine

wetneb commented 4 months ago

@antoine2711 there is a new release for the Commons extension which should work with OpenRefine 3.8 and the development version of OpenRefine (master branch)

Vesihiisi commented 2 months ago

I took the latest OR snapshot (#2442) and tried uploading a file with a tab (0x09) in the name.

Some thoughts on the experience:

image

[Warning] The file upload action returned the 'Warning' error code. Warnings are: {badfilename="Skövde_stadsbibliotek_interior-01.jpg"}

Again, if I didn't know about non-printable characters, I wouldn't be able to guess the reason for the error. I guess that's the raw error returned by the API.

I think the pain point is the fact that I was allowed to start uploading the file in the first place.

wetneb commented 2 months ago

We used to have this warning with the highest severity level ("Critical") which prevents the user from doing the upload, but because our regular expression catching invalid characters had false positives (flagging characters which were actually allowed, https://github.com/OpenRefine/OpenRefine/issues/5656) we changed it to "Warning" so that the user is still able to attempt the upload (https://github.com/OpenRefine/OpenRefine/pull/6227).

We can of course revert this move, or somehow find a more reliable source of information for which characters are allowed in Commons filenames.

Highlighting the special characters (such as your tab character) makes sense in any case.

wetneb commented 2 months ago

I appreciate the new "Wikibase editing results" column, but the content is not helpful for inexperienced users.

The idea is that it's at least something they can include in their report when asking for help (without having to check the server logs). If you have ideas of how to improve it, I am all ears.

We could add some logic to translate specific MediaWiki error messages to a different format so they can be more easily understood by the user, but aiming to cover all possible MediaWiki errors is beyond reach I would say.

thadguidry commented 2 months ago

@wetneb Found at the end of https://commons.wikimedia.org/wiki/Commons:File_naming#Language-specific_guidelines

Avoid abusing Unicode. Control characters can be omitted, strange punctuation can be replaced with standard quotes and commas, and symbols such as "♥" are often more natural when spelled out ("heart"), also increasing visibility in search. Furthermore some characters do not render correctly at all in certain operating systems and browsers. It is a good idea to stick to letters, numbers, underscore (space), ASCII hyphen/minus/dash, plus, and period (dot), as these do not have any MediaWiki restrictions. Letters with diacritics and accents are acceptable, but so is omitting diacritics and accents (e.g. "Calderón"/"Calderon", "Erdoğan"/"Erdogan").

Looks like MediaWiki itself has restrictions on filenames as seen in the paragraph above. But hard to find out WHICH and WHERE... Found these as well:

Since Commons uses the same underlying technology as Mediawiki itself... I read that it sometimes depends on which extension is actually used that enables a mass upload API seems important, but more important seems to be the backend database chosen where the last line of filename technical restrictions lies from what I read on the Mediawiki file uploads page?

But I think this is close to the right place in their source (someone might have to ask on Telegram): https://github.com/wikimedia/mediawiki/blob/d38689ae1d7a74cda9df88d9e747b455b66653d6/includes/api/ApiUpload.php#L826

But @Vesihiisi is actually getting the badfilename error which is checked here: https://github.com/wikimedia/mediawiki/blob/d38689ae1d7a74cda9df88d9e747b455b66653d6/includes/upload/UploadBase.php#L806

Using that and poking around more, brought me to this page: https://www.mediawiki.org/wiki/API:Upload Where there I found this:

badfilename: The file name supplied is not acceptable on this wiki, for instance because it contains forbidden characters

thadguidry commented 2 months ago

The Gerrit issue https://gerrit.wikimedia.org/r/c/mediawiki/core/+/942710 where the configuration options for the forbidden characters were deprecated in Mediawiki 1.41+ has some interesting reading about Illegal File Chars, and links to the wikitech-l mailing list issue which is very interesting reading and points to a core problem: https://lists.wikimedia.org/hyperkitty/list/wikitech-l@lists.wikimedia.org/thread/ASODV6622T4YUAY3JO5ZVBL3B5ZQDX2U/