WikiEducationFoundation / WikiEduDashboard

Wiki Education Foundation's Wikipedia course dashboard system
https://dashboard.wikiedu.org
MIT License
392 stars 631 forks source link

Add stats for edits in the Lexeme namespace on Wikidata #4283

Open ragesoss opened 4 years ago

ragesoss commented 4 years ago

Current Behavior:

Lexeme edits are imported, but they do not contribute to either the top-level stats for Wikidata courses, or show up in the CSV stat downloads specific to Wikidata.

Desired Behavior:

  1. A summary of Lexeme contributions should be in the wikidata CSV data.
  2. Lexemes should be shown alongside Items as a top-level stat (including lexemes created, lexemes edited) in Wikidata courses.
cyrillefr commented 1 year ago

4999 brought me here :D

Hello @ragesoss, Can I work on this issue ?

Are we talking about these stats ? http://localhost:3000/courses/Stanford_Law_School/Advanced_Legal_Research_Winter_2020_(Winter) (course) Then click on Edits data ? -> CSV generated

How can I import/reimport new www.wikidata.org stats for a course ?

Have you got courses from dashboard that I can import that have articles within the lexema namespace ?

cyrillefr commented 1 year ago

Hello @ragesoss, May I work on this issue ?

ragesoss commented 1 year ago

@cyrillefr sure, go for it! :-)

ragesoss commented 1 year ago

The easiest way to get a useful dev environment for this is probably just to look at recent changes in the Lexeme namespace, choose a few usernames that are making edits, and add those to a course on your machine: https://www.wikidata.org/wiki/Special:RecentChanges?hidebots=1&hidecategorization=1&namespace=146&limit=50&days=7&urlversion=2

cyrillefr commented 1 year ago

Hello @ragesoss ,

Thanks for the tip. I have now some data from the Lexeme namespace.

But I would need clarification about where to code exactly.

I have spotted 3 places about Wikidata

So, where are the 1. and the 2. ?

ragesoss commented 1 year ago

The place to begin, I think, is UpdateWikidataStats. That's what generates the stats that are stored in a CourseStat record for each course that has Wikidata stats (and in turn, what goes into the 3 places you found, if I remember correctly). Currently, all the stats come from WikidataSummaryParser and are solely based on information extracted from the edit summaries, but that will need to change.

cyrillefr commented 1 year ago

Hello @ragesoss, Thanks for the clarification.

I have implemented the "create" side of Lexeme. The string in the revision summary is well identified(wbeditentity-create-lexeme in wikidata_summary_parser.rb)

But there is a problem with the "update" side. I have identified by looking at the revisions of an article in the Lexeme namespace what is a change/update for a Lexeme(ex https://www.wikidata.org/w/index.php?title=Lexeme:L1092494&action=history).

I have looked at what we have in the summary string in the corresponding revision in DB after import: wbeditentity-update That is, a string already present in wikidata_summary_parser.rb and it is handled as an unknown update. It appears in the CSV at "other updates", as well at the summary at the top of a course. (May be if we could retrieve the string in italic "Changed a Lexeme" we see in history ...).

So, my thinking is to only add stats for Lexeme created. Is that ok for you ?

ragesoss commented 1 year ago

Yes, that would be a fine incremental improvement. (I'm hoping to work with an intern over the summer to develop a library for analyzing and tabulating stats on wikidata revisions in a much more robust way, because the edit summary approach has major limitations as you've found.)