commons-app / apps-android-commons

The Wikimedia Commons Android app allows users to upload pictures from their Android phone/tablet to Wikimedia Commons
https://commons-app.github.io/
Apache License 2.0
1.03k stars 1.24k forks source link

Improve file description #4137

Open SJu-w opened 3 years ago

SJu-w commented 3 years ago

Summary:

Files uploaded to Commons by the app have a name of the photographed subject just in the filename, but not in the file description ("description" field of the "Information" template). The description field contains just the label of the value of the P31 (instance of) property, which is usually a vague general term. The description also does not contain the wikidata identifier of the item which the photo should belong to.

Example:

See https://commons.wikimedia.org/w/index.php?title=File:Souso%C5%A1%C3%AD_svat%C3%BDch_j%C3%A1hn%C5%AF_Ji%C5%99%C3%ADho_a_Agapita_u_cesty_ke_kostelu_sv._Petra_a_Pavla_v_Mimoni.jpg&oldid=521598789 file title and original file description. The description should be {{cs|1=Sousoší svatých jáhnů Jiřího a Agapita u cesty ke kostelu sv. Petra a Pavla v Mimoni}} but it is {{cs|1=socha}} (= statue) instead. The Q code of the Wikidata item is quite missing.

Steps to reproduce:

This pattern can be reproduced by uploading whatever new image based on certain Wikidata item found in the map. The problem is probably not depending on the used device or system.

Commons app version: 2.13.2~757c7b008 in the example output.

Solutions:

Edit the template which generates the description of photos saved to Commons by the application:

I suppose, the correction should be simple and banal and does not require much programming knowledge, only access to the template that generates the file description output. I wonder why such an unnecessary mistake occurred at all.

misaochan commented 3 years ago

Pinging @nicolas-raoul @VojtechDostal for input.

nicolas-raoul commented 3 years ago

Thanks for the feedback!

I am open to any description template that makes the picture more useful :-) I think that duplicating information should generally be avoided (in order to not waste contributors' time), but if there is a Commons policy that recommends duplication then I am happy to conform to it.

I agree with you that the QID being only in the File usage on other wikis is not enough. And here is some good news: recent versions of the app (for instance the alpha) add the QID to "depicts" in the Structured Data tab (example). Now the question is: should we duplicate the information from Structured Data into the description? At first sight it does not sound to me like a great idea: someone who edits Structured Data would have to then also edit the description to avoid discrepancies. I would not be against adding to the description something like {{include depicted QIDs labels}} if there is such a template.

I agree that P31 labels are not always super useful, but it can be helpful for people using text-based search (as opposed to structured search) on Commons. Ideally we should use not P31 (example: "statue) but rather the Wikidata item's description (example: "baroque statue in Mimoň, Czech Republic") when there is one.

Many items do not yet have a description, but the smart people at Wikidata are doing their best to fill great descriptions in as many languages as possible. For the items that still do not have a description, we can fall back to using the item's P31(s) and possibly P131/P17.

Thanks for pointing out the parameters order problem! Would you mind creating a different issue for it? Please include an example of what the app currently generates, and what it should generate instead (same content with the right order). Thanks!

SJu-w commented 3 years ago

Hi, thanks for your reaction.

I thought it would be a perfectly simple fix for the obvious error. I would not have imagined how much text we will have to write because of this. :-)

VojtechDostal commented 3 years ago

I am not sure that this long explanation was needed. It sort of diverges into discussions which should be held on Wikimedia Commons and not here.

I am in favor of using the item description as the default description for the image. If this is not available, we can always fall back to P31. (Alternatively, we can fall back to a more clever data-based description formed using a more complex algorithm using several properties. Eg. "city in Portugal". This would require more programming but tools such as Mix'n'Match and Duplicity are able to do that quite well already.)

I wouldn't include information about depicted items into the description - it's already in the structured data, which is the preferred way of doing this on Wikimedia Commons. If the community decides they want to include this information in {{information}} too, they can do it for all the hosted images in bulk, with a bot.

As for categories, I think #3595 would be very helpful.

SJu-w commented 3 years ago

The commonly established standard of Commons is that the description provides "brief (if possible) but complete information about the image", including indentification of notable subjects or objects, location, relevant circumstances and all information to support all item categories of the file. Normally, the description contains more complex and complete information than the file name itself. Now, the application violates this principle flagrantly, and needlessly.

No need to open or question these proven and consensual principles of Commons here. Here we solve a problem with an application that violates these principles and floods Commons with lot of photos with insufficient, nonsense and worthless descriptions. If someone at Commons makes attempts to partially duplicate information by the "structured data", there is no reason to break and ignore the current basic system, which is far from being and cannot be fully replaced by that structured data. So far, structured data (similarly as so-called "annotations") can only encode and highlight part of the information from the description in a very partial and imperfect way, and currently they are also contamined with a lot of ballast and are not effectively maintainable nor usable, and even not directly visible from the main page of files, and not integrated to it, and not really structured. For now, the P180 (depicts) values of Structured Data are some hashtags, some quasi-categories which don't really work like categories (we put the content somewhere, but we don't easily see what's put there) and are not really structured and maintainable like categories. For our issue, however, it is essential that so-called "structured data" does not replace and will not be able to replace a proper description of the file for a long time. Robotic bulk re-import of P180 values from structured data into a description carries the risk that the description will be devalued by a lot of ballast, because structured data are massively misused for purposes for which they were probably not intended.

The property P31 cannot have a greater informational value than the actual and specific description of what is in the picture and cannot to substitute it. In addition, we already have experience that P31 value often links to locally and culturally conditioned general terms, which are very prone to inaccuracy, unsharpness and interwiki conflicts and fundamentally complicate their resolution. Generally, the case that the item has absolutely no label should be inadmissible in Wikidata. If such a completely extraordinary and undesirable case occurs, then perhaps it is better to use only the Q code as a title and description than to try to compile an alternative description from P31 and other properties. In general, suppose that a Wikidata label, along with the appropriate description, should unequivocally define the topic, including its most essential attributes. When it's not so, it is a problem of Wikidata content and filters, not defect of the app.

It is true that a link to a specific Wikidata item or items (in whatever form) should also be part of the description, and for the future, it can be used by an intelligent template to mediate and internationalize the description. The {{Q}} template exists already, and for the future it can be improved, e.g. gramatized. Template {{Wikidata Infobox}} generates a more complex output. For the file description, there would fit something between the two mentioned, for the dominant item of the image. Improving and creating such templates is a task for the Commons.

The application that creates the file page should ensure that the link to the Wikidata item is in a usable form, taking into account possible future uses, the possible context (more items affected etc.) and possibility to extend and modify the description.

nicolas-raoul commented 3 years ago

Here is the algorithm I would recommend for pictures that represent a Wikidata item:

If anyone does not agree, please post your exact recommended algorithm, including at least these two cases. Thanks!

SJu-w commented 3 years ago

If someone still does not understand that "label + description" identify the subject of an item, while P31 cannot in principle identify the subject of an item and of its photo, then I am afraid that no other "exact recommended algorithm" or examples can help with it. I pointed out an obvious mistake that lasted here for several months, and instead of someone fixing it immediately in two minutes, we will get bogged down in endless useless discussions here. There is no need to invent anything. If the application can now usually create a meaningful file name, it can create the basis of the description in exactly the same way.

SJu-w commented 3 years ago

Btw, I think it's obvious that "baroque statue in Mimoň, Czech Republic" doesn't identify the subject of the photo, while "Sousoší svatých jáhnů Jiřího a Agapita u cesty ke kostelu sv. Petra a Pavla v Mimoni" is quite specific aned even is there no need to add anything to it.

VojtechDostal commented 3 years ago

Too long - didn't read, but I think @SJu-w might be proposing to use Wikidata item's label in the description field of the {{information}} template. I didn't think about that before and it might be nice to include it in the description too... However, sometimes the Wikidata label isn't very informative for discerning several unnamed objects - eg. https://www.wikidata.org/wiki/Q103822064 - label is literally just "bridge". That's why I rather support @nicolas-raoul's suggestion to use the Wikidata item's description for description of the image. Anyways, the Wikidata item's label is already in the filename.

SJu-w commented 3 years ago

Didn't read this text, didn't read also the file pages from the last months? Try to read at least the first two paragraphs so that we don't spin in a circle. You are trying unnecessarily to invent for what is already described above, and obvious even before this discussion.

Is it really so hard to understand that the "description" on a file page has to identify what's in the image? It's really so hard to understand that on Wikidata, "label + description" together identify the subject of an item, while P31 alone or the second part of the pair cannot identify the item? Didn't nobody of you find it faulty in the past few months, what meaningless descriptions did the application generate?

Have you been confused by the fact that on Wikidata the name "description" bears a field which is in fact only a supplement to the real description core, which is named "label"? And are you unable to recognize and correct this mistake with common sense?

Of course - if in some cases there is a wrong or meaningless label + description in Wikidata, this application cannot solve it (poor labels on Wikidata are mainly a problem with incorrectly set up bulk imports, or their sources). The point is that this application does not produce meaningles descriptions needlessly.

nicolas-raoul commented 3 years ago

I am sure Vojtěch was joking, I for one read all of your comments with attention, even if some parts (like the part about "hashtags" which is not on topic here) might sound more like a rant. Please be assured that your point of view is considered and respected. You have already convinced us that using only P31 (which is what the app currently does) is not good enough.

Yesterday I asked you to post your exact recommended algorithm, but it seems that you did not post it. So here I will try to sum up your algorithm:

Does it sum up your algorithm correctly? If not, please correct it, thanks!

Here are my thoughts about that algorithm:

  1. I am not a fan of duplicating information from filename to description, but if that's what the community strongly want, then I am OK with it.
  2. The user most likely does not understand this other language, so they can not judge whether it really applies or whether it is actually erroneous, or should be edited. Subsequent readers of the description will think that the person who took thee photograph actually wrote this, giving to the text undue credibility. I am sure that if we implement that, many Commons users will come here to complain that it is the worst bug ever.
SJu-w commented 3 years ago

Hi nicolas-raoul. This time, perhaps we understood each other in the basics. This step can be solved almost immediately so that the file description has at least as informative value as the file name. So I will add just a few clarifications or improvements:

There is also the possibility that the verbal description (label+description+P131 administrative division chain) would only be displayed through a template containing Q code, similarly as in {{Wikidata Infobox}}. However, such an internationalization template can IMHO cause problems for full-text searching, and the text cannot be simply adapted and enhanced by the uploader. I would rather leave this solution for the further development of structured data interface, and I would not bring it into the classic old form of description.

And what about file names? The relationship between the file name and the description presupposes a certain degree of duplication, because both have to express what is in the picture. But usually the file name is required to be as short as possible, while the description should be as accurate and unambiguous and complex as possible. It is true that automatic pre-filling cannot take advantage of this difference as sensitively as an experienced contributor. In general, we can solve this by inserting only the label into the file name by default, while the label + description into the description. But from automatic imports, we have many cases where the label is not specific enough, e.g. "wayside cross" for any specific wayside cross. This problem can not be solved by the application, Wikidata maintainers must solve the problem primarily. But the application can alleviate the problem in some cases.

As an example, for Q80456207, the file name consisting from the pure label Pamětní kříž is obviously too unspecific. There are several options that can be considered:

  1. to use the same pattern as for the description, ie label + description (example for Q80456207: Pamětní kříž (pamětní kříž v obci Budeč, okres Jindřichův Hradec).jpg)
  2. to add a Q code to the end of the file name, e.g. label + Q (example for Q80456207: Pamětní kříž (Q80456207).jpg)
  3. to add P131 at the top of the file name, e.g. P131 + label (example for Q80456207: Budeč, Pamětní kříž.jpg)
  4. some combination of the previous forms. (1+2+3-combination example for Q80456207: Budeč, Pamětní kříž (pamětní kříž v obci Budeč, okres Jindřichův Hradec, Q80456207).jpg)

However, in a combination of text fields there is always a risk of duplication or multiplication of the same name and disproportionately long file name (1+2+3-combination example for Q72850340: Morávka, Kříž v údolí potoka Mražok v Morávce (pamětní kříž v obci Morávka, okres Frýdek-Místek, Q72850340).jpg; deterrent example for Q2242554: Mělník, Mělník (zámek v Mělníku, Q2242554).jpg).

Unfortunately, the way Wikidata items are described is very varied, so a different combination may be fitting for each item. Option 2 would probably seem least problematic to me in this situation. But the Q code is intended to be hidden - it is not intended to be visible and to be read and remembered by people. So it may be best to leave the status quo for now. Wikidata maintainers face the great task of unifying and establishing standards for the use of labels and descriptions.

misaochan commented 3 years ago

If the Wikidata item has a description in my language → Pre-fill the picture's description using the Wikidata item's label+description (example: sculpture of the Holy Deacons of George and Agapit on the way to the Church of St. Peter and Paul in Mimoň (baroque statue in Mimoň, Czech Republic))

This description makes sense to me personally. I am also OK with switching to this if the community supports it.

If the Wikidata item does not have a description in my language → Pre-fill the picture's description using the label+description in another language (example: ミモンのペターとポール教会の道の途中のジョージとアガピットの像 (チェコ・ミモンにあるバロック像)) The user most likely does not understand this other language, so they can not judge whether it really applies or whether it is actually erroneous, or should be edited. Subsequent readers of the description will think that the person who took thee photograph actually wrote this, giving to the text undue credibility. I am sure that if we implement that, many Commons users will come here to complain that it is the worst bug ever.

I agree with this, the description should not be pre-filled with something that the user cannot understand. If pre-filling with instanceOf is not an acceptable way of handling it, then perhaps we can just not pre-fill at all, and the user can insert their own description, same as they would with most other methods of uploading to Commons?

SJu-w commented 3 years ago

@misaochan: The main and basic purpose of image description is to identify what is in the image. This is what some people did not understand, and that is why this issue arose at all. If there is a tool designed to upload a photo of a particular subject, then by default such a tool pre-fills a unique identification of that subject. This practice has been widely applied and has proved its worth, especially for uploading hundreds of thousands of images within the Wiki Loves Monuments project, and for other campaings based on that technologies. It is highly desirable in all similar cases when the upload is based on a placeholder or upload link of a specific article, list item or map object of a specific subject.

However, what is still causing us problems is how to clearly and instructively indicate to uploaders that it is very desirable to expand and supplement this default automatic description as needed. So that the uploader is not afraid to edit or modify the pre-filled text. That's why it is advisable that the pre-filled text is editable, not only automatically displayed from Wikidata. Another problem is that the disorganized development of the user interface (UploadWizard, Structured Data etc.) causes the description to be scattered chaotically to several different places. Some also do not understand that the "description" field at the file page should not be just a secondary addition to the file name, but on the contrary, the file name and the structured data label should be a short summary of the most essential from the description, and the description should be as complex and exhaustive as possible. However, if I contributed to the project using an application, it would be practical for me to quickly upload the file with only a simple automatic description, and later create a more precise description comfortably from the desktop, if needed.

It should also be noted that at least bilingual descriptions are a desirable standard at Commons, although contributors who speak only one language are also welcome. Here wee need to find a compromise so that the application supports the possibility of multilingual descriptions, but so that the user is not forced to edit or approve a description that he does not understand. However, we should assume that if the contributor correctly selected which object he photographed, then the labels+descriptions automatically taken from Wikidata should be correct even in languages that the uploader does not understand. We only have to reckon with the fact that in these languages he will not be able to specify or extend the description beyond this basic description. The description in the main (local) language should never be omitted altogether, even if the photo is uploaded by a foreigner who does not speak the local language. Further work on sorting the file and refiniement of the description will usually be done mainly by local editors. And English is desirable as the main international language of the whole community. The uploader must provide in at least one language mainly such information that someone else would not be able to easily find out afterwards.

misaochan commented 3 years ago

As we all appear to agree on how to handle the "If the Wikidata item has a description in my language" scenario, I will create a separate issue for it that can be worked on ASAP.

Re: the "If the Wikidata item does not have a description in my language" scenario, I don't think WLM has a model that we can follow in this case. AFAIK the WLM monument lists for every country is carefully curated by national organizers from that country, so "does not have a description in my language" is very unlikely to happen in WLM or other national-based campaigns.

Multilingual descriptions are already possible in the app. However, pre-filling a description with a language that the user does not understand would feel very strange and wrong to the user IMO. Perhaps there is another way that we can approach this.

SJu-w commented 3 years ago

It's been a few months, and the app still produces those nonsensical, meaningless descriptions!