WikiEducationFoundation / WikiEduDashboard

Wiki Education Foundation's Wikipedia course dashboard system
https://dashboard.wikiedu.org
MIT License
385 stars 600 forks source link

Research and improve `spec/lib/importers/revision_score_importer_spec.rb` #5854

Open gabina opened 1 week ago

gabina commented 1 week ago

What is happening?

Specs in spec/lib/importers/revision_score_importer_spec.rb are not working as expected. While they're passing in master, the generated fixtures seem suspicious. It looks like the problematic spec is marks RevisionNotFound revisions as deleted. The behavior is not consistent. We should research a bit more to better understand what's going on.

To Reproduce

Steps to reproduce the behavior (locally):

  1. Delete the existing fixture files
  2. Run rspec ./spec/lib/importers/revision_score_importer_spec.rb
  3. Check the LiftWing API request/response for the revision 753277075 in fixtures/vcr_cassettes/revision_scores/deleted_revision.yml. Does it exist? I couldn't find it. It looks like there is no LiftWing API request for that revision id, which doesn't make sense. However, the spec is passing.
  4. Delete the existing fixture files again
  5. Comment the "marks TextDeleted revisions as deleted" spec.
  6. Run rspec ./spec/lib/importers/revision_score_importer_spec.rb again. Now the "RevisionScoreImporter marks RevisionNotFound revisions as deleted" spec doesn't pass, due to deleted being false.
  7. Check the LiftWing API request/response for the revision 753277075 in fixtures/vcr_cassettes/revision_scores/deleted_revision.yml.

After spec 7, I found the following request in my fixture file for revision 753277075. It seems that the error changed and doesn't contain "RevisionNotFound" string anymore. That would explain why the spec fails and the revision is not marked as deleted.

- request:
    method: post
    uri: https://api.wikimedia.org/service/lw/inference/v1/models/enwiki-articlequality:predict
    body:
      encoding: UTF-8
      string: '{"rev_id":753277075,"extended_output":true}'
    headers:
      Content-Type:
      - application/json
      User-Agent:
      - Faraday v1.10.2
      Accept-Encoding:
      - gzip;q=1.0,deflate;q=0.6,identity;q=0.3
      Accept:
      - "*/*"
  response:
    status:
      code: 400
      message: Bad Request
    headers:
      Content-Length:
      - '282'
      Content-Type:
      - application/json
      Date:
      - Mon, 24 Jun 2024 19:38:43 GMT
      Server:
      - envoy
      Cache-Control:
      - no-cache
      X-Ratelimit-Limit:
      - 50000, 50000;w=3600
      X-Ratelimit-Remaining:
      - '49918'
      X-Ratelimit-Reset:
      - '1277'
      Age:
      - '2'
      X-Cache:
      - cp1106 miss, cp1106 pass
      X-Cache-Status:
      - pass
      Server-Timing:
      - cache;desc="pass", host;desc="cp1106"
      Strict-Transport-Security:
      - max-age=106384710; includeSubDomains; preload
      Report-To:
      - '{ "group": "wm_nel", "max_age": 604800, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0"
        }] }'
      Nel:
      - '{ "report_to": "wm_nel", "max_age": 604800, "failure_fraction": 0.05, "success_fraction":
        0.0}'
      X-Client-Ip:
      - 2800:40:35:5812:e471:82bd:9a71:2557
    body:
      encoding: UTF-8
      string: '{"error":"The MW API does not have any info related to the rev-id provided
        as input (753277075), therefore it is not possible to extract features properly.
        One possible cause is the deletion of the page related to the revision id.
        Please contact the ML-Team if you need more info."}'
  recorded_at: Mon, 24 Jun 2024 19:38:44 GMT
  1. Uncomment the "marks TextDeleted revisions as deleted" spec.
  2. Run rspec ./spec/lib/importers/revision_score_importer_spec.rb again. "marks TextDeleted revisions as deleted" spec is not passing.
    1. Check the LiftWing API request/response for the revision 708326238 in fixtures/vcr_cassettes/revision_scores/deleted_revision.yml. I didn't find any LiftWing API request.

It looks like for some reason the LiftWing API requests for revisions 708326238 and 753277075 are mixed up.

Expected behavior