WikiEducationFoundation / WikiEduDashboard

Wiki Education Foundation's Wikipedia course dashboard system
https://dashboard.wikiedu.org
MIT License
392 stars 631 forks source link

Update for a Wikidata course errors when WikidataDiffAnalyzer is called #6032

Open ragesoss opened 2 weeks ago

ragesoss commented 2 weeks ago

This course has been errorring during every data update: https://outreachdashboard.wmflabs.org/courses/Wikimedia_Indonesia/Datathon_12TahunWikidata_(28_-_30_Oktober_2024)

Here's the relevant portion of the stack trace:

NoMethodError: undefined method `each' for nil:NilClass

            revisions.each do |revision|
                     ^^^^^
  from wikidata-diff-analyzer (2.0.2) lib/wikidata/diff/api.rb:37:in `block in get_revision_contents'
  from wikidata-diff-analyzer (2.0.2) lib/wikidata/diff/api.rb:33:in `each'
  from wikidata-diff-analyzer (2.0.2) lib/wikidata/diff/api.rb:33:in `get_revision_contents'
  from wikidata-diff-analyzer (2.0.2) lib/wikidata/diff/large_batches_analyzer.rb:12:in `block in handle_large_batches'
  from wikidata-diff-analyzer (2.0.2) lib/wikidata/diff/large_batches_analyzer.rb:10:in `each'
  from wikidata-diff-analyzer (2.0.2) lib/wikidata/diff/large_batches_analyzer.rb:10:in `each_slice'
  from wikidata-diff-analyzer (2.0.2) lib/wikidata/diff/large_batches_analyzer.rb:10:in `handle_large_batches'
  from wikidata-diff-analyzer (2.0.2) lib/wikidata-diff-analyzer.rb:75:in `analyze'
  from app/services/update_wikidata_stats.rb:91:in `update_summary_with_stats'
  from app/services/update_wikidata_stats.rb:74:in `initialize'
  from app/workers/update_wikidata_stats_worker.rb:11:in `new'
  from app/workers/update_wikidata_stats_worker.rb:11:in `perform'

This error happens in the WikidataDiffAnalyzer gem: https://github.com/WikiEducationFoundation/wikidata-diff-analyzer/blob/main/lib/wikidata/diff/api.rb#L42

I guess it is related to the shape of the API response, which maybe includes a 'page' entry without a 'revisions' property.

Working around it by adding a guard statement in that block (eg, next unless revisions) might fix the issue, but I'd like to understand exactly why it's happening and what the API response looks like first. Replicate the bug, identify the API query that triggers it, and then we can decide on the best way to fix it.

ragesoss commented 2 weeks ago

@saha23s it took a while, but it looks like we finally uncovered a bug in your gem.

empty-codes commented 2 weeks ago

I would like to work on this! @ragesoss

empty-codes commented 2 weeks ago

@ragesoss I've completed the task.

Steps to Replicate the Bug

  1. Create a course object for course_id: 10023 and pass it to the update_wikidata_stats_worker.rb.
  2. Call the worker from the Rails console.
  3. Run bundle exec sidekiq and check for the worker.
  4. Worker gives an error as expected, with the stacktrace (same as the one above):

    WARN: NoMethodError: undefined method `each' for nil:NilClass
    
    revisions.each do |revision|
                ^^^^^
    2024-11-07T11:45:03.953Z pid=10928 tid=1m3g WARN: /usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/api.rb:37:in `block in get_revision_contents'
    /usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/api.rb:33:in `each'
    /usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/api.rb:33:in `get_revision_contents'
    /usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/large_batches_analyzer.rb:12:in `block in handle_large_batches'
    /usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/large_batches_analyzer.rb:10:in `each'
    /usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/large_batches_analyzer.rb:10:in `each_slice'
    /usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff/large_batches_analyzer.rb:10:in `handle_large_batches'
    /usr/share/rvm/gems/ruby-3.1.2/gems/wikidata-diff-analyzer-2.0.2/lib/wikidata/diff-analyzer.rb:75:in `analyze'
    /home/emptycodes/WikiEduDashboard/app/services/update_wikidata_stats.rb:103:in `update_summary_with_stats'
    /home/emptycodes/WikiEduDashboard/app/services/update_wikidata_stats.rb:75:in `initialize'
    /home/emptycodes/WikiEduDashboard/app/workers/update_wikidata_stats_worker.rb:12:in `new'
    /home/emptycodes/WikiEduDashboard/app/workers/update_wikidata_stats_worker.rb:12:in `perform'

To Identify the API Query and Response that Triggers the Bug

  1. I Retrieved the revision_ids for the course from the update_summary_with_stats method.
  2. I then cloned the WikidataDiffAnalyzer gem codebase and added logging statements to get the relevant API queries and responses.
  3. Then I called the Api.get_revision_contents(revision_ids) method directly.

Findings

There are two pages that do not have revisions:

Page ID: 18820

API Query:

{
  "url": "https://www.wikidata.org/w/api.php",
  "action": "query",
  "params": {
    "prop": "revisions",
    "revids": "2266122608|2266122618|2266122626|2266122646|2266122666|2266122683|2266122709|2266122730|2266122739|2266122747|2266122763|2266122777|2266122783|2266122790|2266122808|2266122817|2266122829|2266122850|2266122880|2266122931|2266122949|2266122973|2266122994|2266123011|2266123017|2266123021|2266341034|2266123060|2266123123|2266123148|2266123175|2266123210|2266123270|2266123325|2266123373|2266123418|2266341148|2266123442|2266123459|2266123479|2266123502|2266123529|2266123536|2266123548|2266123562|2266123568|2266341782|2266123581|2266123596|2266123602",
    "rvslots": "main",
    "rvprop": "content|ids|comment",
    "format": "json"
  }
}

Response:

{
  "pageid": 188280,
  "ns": 0,
  "title": "Q189784"
}

Page ID: 265881

API Query:

{
  "url": "https://www.wikidata.org/w/api.php",
  "action": "query",
  "params": {
    "prop": "revisions",
    "revids": "2266122608|2266122618|2266122626|2266122646|2266122666|2266122683|2266122709|2266122730|2266122739|2266122747|2266122763|2266122777|2266122783|2266122790|2266122808|2266122817|2266122829|2266122850|2266122880|2266122931|2266122949|2266122973|2266122994|2266123011|2266123017|2266123021|2266341034|2266123060|2266123123|2266123148|2266123175|2266123210|2266123270|2266123325|2266123373|2266123418|2266341148|2266123442|2266123459|2266123479|2266123502|2266123529|2266123536|2266123548|2266123562|2266123568|2266341782|2266123581|2266123596|2266123602",
    "rvslots": "main",
    "rvprop": "content|ids|comment",
    "format": "json"
  }
}

Response:

{
  "pageid": 265881,
  "ns": 0,
  "title": "Q274897"
}

As suspected, the bug is caused by pages that do not have a revisions property, as other pages have it. For example:

Page ID: 54252

API Query:

{
  "url": "https://www.wikidata.org/w/api.php",
  "action": "query",
  "params": {
    "prop": "revisions",
    "revids": "2266056992|2266058518|2266056998|2266058525",
    "rvslots": "main",
    "rvprop": "content|ids|comment",
    "format": "json"
  }
}

Response:

{
  "pageid": 54252,
  "ns": 0,
  "title": "Q52053",
  "revisions": [
    {
      "revid": 2266056998,
      "parentid": 2156694334,
      "slots": {
        "main": {
          "contentmodel": "wikibase-item",
          "contentformat": "application/json",
          "*": "// removed cause too long"
        }
      },
      "comment": "/* wbcreateclaim-create:1| */ [[Property:P1933]]: star-heritage-1-the-black-cobra, #quickstatements; #temporary_batch_1730058381115"
    },
    {
      "revid": 2266058525,
      "parentid": 2266056998,
      "slots": {
        "main": {
          "contentmodel": "wikibase-item",
          "contentformat": "application/json",
          "*": "// removed cause too long"
        }
      },
      "comment": "/* wbcreateclaim-create:1| */ [[Property:P11690]]: 162541, #quickstatements; #temporary_batch_1730070687913"
    }
  ]
}
ragesoss commented 2 weeks ago

Great! So I think just adding a guard statement is an appropriate fix. @empty-codes would you like to implement the fix in the gem codebase?

empty-codes commented 2 weeks ago

@ragesoss I can do that! I'll create an issue for it now and start working on it.

empty-codes commented 2 weeks ago

@ragesoss Done! Here's the PR Link: https://github.com/WikiEducationFoundation/wikidata-diff-analyzer/pull/32