S1M0N38 / soccerapi

soccerapi, an unambitious soccer odds scraper ⚽️
MIT License
157 stars 36 forks source link

Decoding odds data #11

Closed HMaker closed 3 years ago

HMaker commented 4 years ago

I see this library does not uses selenium or other browser automation tools to scrape data from bet365, it is interesting. Is it being maintained?

S1M0N38 commented 4 years ago

At the moment is not maintained (due to time constraints), but maybe in the future I will fix the bugs that appears.

At the time when I was developing ... The code just perform an http request to bet365 which responds with a file full of data that in order to understand had to decoded. The a silly function try to guess the key for the xor process and then, when the key was found, it is used to make odds data readable.

Nowadays bet365 maybe have changed something and the whole code is now broke.

HMaker commented 4 years ago

@S1M0N38 Could you document how to deobuscate bet365 data? I see they keep two websocket connections open to update the site's data, it seems that websocket messages are obfuscated. I found that bet365 employs mechanisms to block automation bots, in that cases people most of times uses Selenium or other browser automation tools. Is it hard to deobuscate their data? Knowing that I could try to contribute.

S1M0N38 commented 4 years ago

I investigate the bug and turn out to be just a minor change in a "parsing keyword". I also update some of the leagues ids (france-league_1 and italy-serie_b) and only when competitions start again I can update all leagues ids. Remember that bet365 (maybe) work if requests are perform from Italy (other country are not tested and VPN are detected by bet365).

HMaker commented 4 years ago

Following the request's stacktrace in chrome's debugger I found the JS code that decodes the odd's values:

function e(t, e) {
    var n, r, s, a;
    if (!t || !e)
        return t;
    for (e != i && (C = {}, i = e), r = e.charCodeAt(0), s = "", a = 0; a < t.length; a++)
        s += String.fromCharCode(t.charCodeAt(a) ^ r);
    return s
}

Do you know from where the key e comes from?

HMaker commented 4 years ago

@S1M0N38 Also, your brute force algorithm scans from 0 to 130, but String.charCodeAt() and String.fromCharCode() works with UTF-16 chars which goes from 0 to 65535 (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charCodeAt). Am I missing something?

S1M0N38 commented 4 years ago

Yes, my approach it's quite heuristic. I don't know where the key come from (but my suspect is that can be found in js client side with a further effort). When I was developing I send a bunch of request and I saw that the key char was always in range from 0 to 130. At the moment (12th ago 2020) api/bet365.py shoud works just for _france - ligue1 (the only available main competition on bet365, I plan to update the .json file where other competition appear on the site). Remember that i ran my test from Italy and performing request from other country was not tested.

If try to make it work you can set up the following the env (I use pipenv for deps manager)

  1. clone the repo: git clone git@github.com:S1M0N38/soccerapi.git
  2. go into repo: cd soccerapi
  3. install dependencies (for developing): pip install -e . && pipenv install --dev
  4. activate pipenv pipenv shell and start to play around

For test running pytest is used. To run all test use pytest. For tests options read pytest docs

HMaker commented 3 years ago

I think this issue was solved by 800e97b