dennistobar / serobot

Un bot que revierte ediciones vandálicas en Wikipedia en español, usando ORES
MIT License
5 stars 5 forks source link

Help migrating away from ORES #5

Open elukey opened 1 year ago

elukey commented 1 year ago

Hi! I am part of the Wikimedia ML team, we are starting the migration of ORES client to another infrastructure, since we are planning to deprecate it. More info in https://wikitech.wikimedia.org/wiki/ORES

TL;DR:

The ORES infrastructure is going to be replaced by Lift Wing, a more modern and kubernetes-based service. All the ORES models (damaging, goodfaith, etc..) are running on Lift Wing, more on how to use them in https://wikitech.wikimedia.org/wiki/Machine_Learning/LiftWing/Usage We have new models called Revert Risk, to replace goodfaith and damaging for example. The are available on Lift Wing, and we'd like to offer them as valid and more precise/performant alternative to ORES models. If you'd like to try them we'd help in the migration process! Thanks in advance,

ML team

dennistobar commented 1 year ago

@elukey, to understand,

Is that the correct interpretation of the page?

elukey commented 1 year ago

@elukey, to understand,

  • ORES will be deprecated

Correct

  • I must use authenticated requests to run the bot

There are two options: 1) unauthenticated requests, but the rate limit (at the moment, we can probably bump it a little if needed) is 10k request per hour. 2) Authenticated requests, that have more requests available (you can reach 100k requests / hour for example) So you can chose, we'd prefer authenticated but anything works.

  • Instead use 1 API call, I need to use 2 API calls, affected by a rate limit

If you want to keep using goodfaith and damaging, yes exactly. But you could replace them with one of the two Revert Risk models, that in theory are way better suited for the serobot use case IIUC. In this case you'd just make one API call and use the score provided by the Revert Risk model.

Thanks!

dennistobar commented 1 year ago

@elukey I'll try using the Revert Risk model without authentication before migrating from ORES. SeroBOT is running from toolsever, so it will not have problems running or API limit...

elukey commented 1 year ago

Thanks a lot! It is really important for us that folks start experimenting with Lift Wing and Revert Risk, so please ping me or cut a ticket in Phabricator with "Machine-Learning-team" if you need any help!

I checked quickly the traffic per hour that Serobot makes and we should be ok with 10k requests / hour (anonymous traffic), but we can review the limit if you hit any issues while testing.

As reminder, we have two Revert Risk models:

They have the same scope but with different features, not entirely sure what Serobot needs but both of the above should be a better and more up-to-date version of the revscoring damaging/goodfaith ones.

elukey commented 1 year ago

@dennistobar Hi! Any news? Do you need any help in migrating? Looking forward to get your feedback :)

elukey commented 1 year ago

Some updates:

dennistobar commented 1 year ago

@dennistobar Hi! Any news? Do you need any help in migrating? Looking forward to get your feedback :)

mmm... I'm just sending some requests, and the API seems fine. I'll try to use the model to migrate in eswiki and eswikibooks. About the models recommended, both haven't full support for eswikibooks.

elukey commented 1 year ago

@dennistobar Hi! Any news? Do you need any help in migrating? Looking forward to get your feedback :)

mmm... I'm just sending some requests, and the API seems fine. I'll try to use the model to migrate in eswiki and eswikibooks. About the models recommended, both haven't full support for eswikibooks.

@dennistobar I followed up with the Research team, and in theory eswikibooks should work just fine with the language agnostic model, but they have only tested it with wikipedia domains. It should work way better than the ORES models anyway, but in case you want to keep using them Lift Wing offers all the ORES revscoring models as well.

dennistobar commented 1 year ago

@elukey I just pushed a code using revertrisk-multilingual as a backup if ORES doesn't catch up as vandalism (good faith or damaging). I put a limit of 0.950 on the probability before doing an automated revert. I'll figure out if this is the best fit for Spanish Wikipedia.

dennistobar commented 1 year ago

@elukey I just pushed a code using revertrisk-multilingual as a backup if ORES doesn't catch up as vandalism (good faith or damaging). I put a limit of 0.950 on the probability before doing an automated revert. I'll figure out if this is the best fit for Spanish Wikipedia.

@lukey I just reverted the change: a lot of 504 in about 12 hours... the service is not reliable at this point

elukey commented 1 year ago

@dennistobar thanks for the feedback, we are investigating something similar. I noticed that in check_risk you ise the revertrisk-multilingual model, have you tried language agnostic?

The Research team suggests the following:

Therefore, for anonymous edits - on the 47 languages covered - we recommend using the RRML. For the reminding edits (non-anonymous or not covered by RRML), we recommend to use RRLA.

It seems that the multi-lingual model is less performant a more unstable in comparison with the language agnostic one, if you have time and patience to test it I'd be really grateful.

elukey commented 10 months ago

@dennistobar thanks a lot for the merge! I don't see the serobot's user agent in the logs though, is the new code already running? Thanks in advance!

elukey commented 10 months ago

@dennistobar Hi! Anything that I can help with to use the new code? I don't see the UA of Serbot in Lift Wing :(

dennistobar commented 10 months ago

Hi, sorry for being unresponsive. About the change, I'll notify the community because the bot will revert some "good edits" and it will not revert some "obvious" editions. I plan to switch the endpoint over the weekend.

dennistobar commented 10 months ago

@elukey Hi, I started to use the new algorithm. In ORES I have a way to see the evaluation features (weights), in this algorithm, is there any way to see it about one revision?

elukey commented 10 months ago

@dennistobar nice!

We still don't have it for Revert Risk, we are trying to figure out the best way to add it, but we can create a task if you need the feature for SeroBot.