Open ragesoss opened 2 weeks ago
@saha23s it took a while, but it looks like we finally uncovered a bug in your gem.
I would like to work on this! @ragesoss
@ragesoss I've completed the task.
course_id: 10023
and pass it to the update_wikidata_stats_worker.rb
.bundle exec sidekiq
and check for the worker.Worker gives an error as expected, with the stacktrace (same as the one above):
WARN: NoMethodError: undefined method `each' for nil:NilClass
revisions.each do |revision|
^^^^^
2024-11-07T11:45:03.953Z pid=10928 tid=1m3g WARN: /usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/api.rb:37:in `block in get_revision_contents'
/usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/api.rb:33:in `each'
/usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/api.rb:33:in `get_revision_contents'
/usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/large_batches_analyzer.rb:12:in `block in handle_large_batches'
/usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/large_batches_analyzer.rb:10:in `each'
/usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/large_batches_analyzer.rb:10:in `each_slice'
/usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/large_batches_analyzer.rb:10:in `handle_large_batches'
/usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff-analyzer.rb:75:in `analyze'
/home/emptycodes/WikiEduDashboard/app/services/update_wikidata_stats.rb:103:in `update_summary_with_stats'
/home/emptycodes/WikiEduDashboard/app/services/update_wikidata_stats.rb:75:in `initialize'
/home/emptycodes/WikiEduDashboard/app/workers/update_wikidata_stats_worker.rb:12:in `new'
/home/emptycodes/WikiEduDashboard/app/workers/update_wikidata_stats_worker.rb:12:in `perform'
revision_ids
for the course from the update_summary_with_stats
method.WikidataDiffAnalyzer
gem codebase and added logging statements to get the relevant API queries and responses.Api.get_revision_contents(revision_ids)
method directly.There are two pages that do not have revisions:
{
"url": "https://www.wikidata.org/w/api.php",
"action": "query",
"params": {
"prop": "revisions",
"revids": "2266122608|2266122618|2266122626|2266122646|2266122666|2266122683|2266122709|2266122730|2266122739|2266122747|2266122763|2266122777|2266122783|2266122790|2266122808|2266122817|2266122829|2266122850|2266122880|2266122931|2266122949|2266122973|2266122994|2266123011|2266123017|2266123021|2266341034|2266123060|2266123123|2266123148|2266123175|2266123210|2266123270|2266123325|2266123373|2266123418|2266341148|2266123442|2266123459|2266123479|2266123502|2266123529|2266123536|2266123548|2266123562|2266123568|2266341782|2266123581|2266123596|2266123602",
"rvslots": "main",
"rvprop": "content|ids|comment",
"format": "json"
}
}
{
"pageid": 188280,
"ns": 0,
"title": "Q189784"
}
{
"url": "https://www.wikidata.org/w/api.php",
"action": "query",
"params": {
"prop": "revisions",
"revids": "2266122608|2266122618|2266122626|2266122646|2266122666|2266122683|2266122709|2266122730|2266122739|2266122747|2266122763|2266122777|2266122783|2266122790|2266122808|2266122817|2266122829|2266122850|2266122880|2266122931|2266122949|2266122973|2266122994|2266123011|2266123017|2266123021|2266341034|2266123060|2266123123|2266123148|2266123175|2266123210|2266123270|2266123325|2266123373|2266123418|2266341148|2266123442|2266123459|2266123479|2266123502|2266123529|2266123536|2266123548|2266123562|2266123568|2266341782|2266123581|2266123596|2266123602",
"rvslots": "main",
"rvprop": "content|ids|comment",
"format": "json"
}
}
{
"pageid": 265881,
"ns": 0,
"title": "Q274897"
}
As suspected, the bug is caused by pages that do not have a revisions
property, as other pages have it. For example:
{
"url": "https://www.wikidata.org/w/api.php",
"action": "query",
"params": {
"prop": "revisions",
"revids": "2266056992|2266058518|2266056998|2266058525",
"rvslots": "main",
"rvprop": "content|ids|comment",
"format": "json"
}
}
{
"pageid": 54252,
"ns": 0,
"title": "Q52053",
"revisions": [
{
"revid": 2266056998,
"parentid": 2156694334,
"slots": {
"main": {
"contentmodel": "wikibase-item",
"contentformat": "application/json",
"*": "// removed cause too long"
}
},
"comment": "/* wbcreateclaim-create:1| */ [[Property:P1933]]: star-heritage-1-the-black-cobra, #quickstatements; #temporary_batch_1730058381115"
},
{
"revid": 2266058525,
"parentid": 2266056998,
"slots": {
"main": {
"contentmodel": "wikibase-item",
"contentformat": "application/json",
"*": "// removed cause too long"
}
},
"comment": "/* wbcreateclaim-create:1| */ [[Property:P11690]]: 162541, #quickstatements; #temporary_batch_1730070687913"
}
]
}
Great! So I think just adding a guard statement is an appropriate fix. @empty-codes would you like to implement the fix in the gem codebase?
@ragesoss I can do that! I'll create an issue for it now and start working on it.
@ragesoss Done! Here's the PR Link: https://github.com/WikiEducationFoundation/wikidata-diff-analyzer/pull/32
This course has been errorring during every data update: https://outreachdashboard.wmflabs.org/courses/Wikimedia_Indonesia/Datathon_12TahunWikidata_(28_-_30_Oktober_2024)
Here's the relevant portion of the stack trace:
This error happens in the WikidataDiffAnalyzer gem: https://github.com/WikiEducationFoundation/wikidata-diff-analyzer/blob/main/lib/wikidata/diff/api.rb#L42
I guess it is related to the shape of the API response, which maybe includes a 'page' entry without a 'revisions' property.
Working around it by adding a guard statement in that block (eg,
next unless revisions
) might fix the issue, but I'd like to understand exactly why it's happening and what the API response looks like first. Replicate the bug, identify the API query that triggers it, and then we can decide on the best way to fix it.