Open ragesoss opened 4 years ago
Hello @ragesoss, Can I work on this issue ?
Are we talking about these stats ? http://localhost:3000/courses/Stanford_Law_School/Advanced_Legal_Research_Winter_2020_(Winter) (course) Then click on Edits data ? -> CSV generated
How can I import/reimport new www.wikidata.org
stats for a course ?
Have you got courses from dashboard that I can import that have articles within the lexema namespace ?
Hello @ragesoss, May I work on this issue ?
@cyrillefr sure, go for it! :-)
The easiest way to get a useful dev environment for this is probably just to look at recent changes in the Lexeme namespace, choose a few usernames that are making edits, and add those to a course on your machine: https://www.wikidata.org/wiki/Special:RecentChanges?hidebots=1&hidecategorization=1&namespace=146&limit=50&days=7&urlversion=2
Hello @ragesoss ,
Thanks for the tip. I have now some data from the Lexeme namespace.
But I would need clarification about where to code exactly.
I have spotted 3 places about Wikidata
So, where are the 1. and the 2. ?
The place to begin, I think, is UpdateWikidataStats
. That's what generates the stats that are stored in a CourseStat record for each course that has Wikidata stats (and in turn, what goes into the 3 places you found, if I remember correctly). Currently, all the stats come from WikidataSummaryParser and are solely based on information extracted from the edit summaries, but that will need to change.
Hello @ragesoss, Thanks for the clarification.
I have implemented the "create" side of Lexeme. The string in the revision summary is well identified(wbeditentity-create-lexeme in wikidata_summary_parser.rb
)
But there is a problem with the "update" side. I have identified by looking at the revisions of an article in the Lexeme namespace what is a change/update for a Lexeme(ex https://www.wikidata.org/w/index.php?title=Lexeme:L1092494&action=history).
I have looked at what we have in the summary string in the corresponding revision in DB after import: wbeditentity-update
That is, a string already present in wikidata_summary_parser.rb
and it is handled as an unknown update.
It appears in the CSV at "other updates", as well at the summary at the top of a course.
(May be if we could retrieve the string in italic "Changed a Lexeme" we see in history ...).
So, my thinking is to only add stats for Lexeme created. Is that ok for you ?
Yes, that would be a fine incremental improvement. (I'm hoping to work with an intern over the summer to develop a library for analyzing and tabulating stats on wikidata revisions in a much more robust way, because the edit summary approach has major limitations as you've found.)
Current Behavior:
Lexeme edits are imported, but they do not contribute to either the top-level stats for Wikidata courses, or show up in the CSV stat downloads specific to Wikidata.
Desired Behavior: