client.get_date has stopped working

ColorfulQuark commented 11 months ago

day = client.get_date(y, m, d) is just returning an empty dict rather than macro and calorie info. This started a day or two ago.

https://python-myfitnesspal.readthedocs.io/en/latest/how_to/diary.html

judgewooden commented 11 months ago

debugging show a 403-Forbidden on the /food/diary. Exercise, water and note still work.

It is not an Authentication Failure, seems to be a server side restriction.

myfitnesspal --loglevel DEBUG day 2023-09-15
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.myfitnesspal.com:443
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /user/auth_token?refresh=true HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.myfitnesspal.com:443
DEBUG:urllib3.connectionpool:https://api.myfitnesspal.com:443 "GET /v2/users/xxxxxxxxxx?fields%5B%5D=diary_preferences&fields%5B%5D=goal_preferences&fields%5B%5D=unit_preferences&fields%5B%5D=paid_subscriptions&fields%5B%5D=account&fields%5B%5D=goal_displays&fields%5B%5D=location_preferences&fields%5B%5D=system_data&fields%5B%5D=profiles&fields%5B%5D=step_sources HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /food/diary/xxxxxxxxxx?date=2023-09-15 HTTP/1.1" 403 None
2023-09-15
Meals
Totals
DEBUG:urllib3.connectionpool:Resetting dropped connection: www.myfitnesspal.com
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /food/water?date=2023-09-15 HTTP/1.1" 200 None
Water: 3000.0
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /food/note?date=2023-09-15 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /food/note?date=2023-09-15 HTTP/1.1" 200 None
Test Note
Exercises
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /exercise/diary/xxxxxxxxxx?date=2023-09-15 HTTP/1.1" 200 None
Cardiovascular
Strength Training

ColorfulQuark commented 11 months ago

When I download https://myfitnesspal.com/food/diary/xxxxxx using client._get_content_for_url(url), it includes <span id="challenge-error-text">Enable JavaScript and cookies to continue

As you note, other metrics work.

judgewooden commented 11 months ago

Seem the webpage rendered from js.

ColorfulQuark commented 11 months ago

Yes, the question is what to do about js. Cookies should be covered by browser_cookie3.

coddingtonbear commented 11 months ago

This is probably good news, actually -- if the page is being rendered in JS, it means that there may be APIs that we can use for getting direct access to the data now instead of needing to scrape the page.

It'll be a while before I can find the time to look into this, unforch, though!

judgewooden commented 11 months ago

I don't think they have an API for the community, only for business-to-business.

I had a quick look at the js, it executes function(){window._cf_chl_opt={ ... )) which seems to be a component of Cloudflare Bot Management.

coddingtonbear commented 11 months ago

I'm aware of the official API situation -- I'm the person who wrote this library. But if you're rendering data in a UI in Javascript, that data has to be coming from somewhere, and those interfaces are still an API, even if they're an unofficial one.

coddingtonbear commented 10 months ago

I've finally had a little bit of time to have a look at this, and at the moment I'm not sure I can figure this out in the time I have available, but I can at least post a few things I learned here so folks can use them as a jumping-off point toward finding a solution.

I have a branch with some changes that add additional logging around requests that are made and what their responses are in 168__diary_entries_cloudflare. You can use it to run a simple request from the command-line using the --log-requests-to command-line argument and sending a request like:

myfitnesspal --log-requests-to='./temp' day 2023-12-03

Inside the .temp/ folder you'll find sequentially-named .json documents with information about the outgoing and incoming request.

Things I've learned from this include:

The problem request appears to be the 003 one dispatched (the one to https://www.myfitnesspal.com/food/diary/USERNAME?date=2023-12-03). You can tell it's a problem because the returned status code is a 403.
It looks like that request is probably being caught by some cloudflare bot filtering.
I tried to work around that by swapping our session for a cloudscraper.CloudScraper instance (see https://github.com/VeNoMouS/cloudscraper) but that does not appear to solve the problem, and instead raises its own by causing the 002 request to fail with a 500 error. I have not investigated why that's happening in any depth, though, and it might be that there's some kind of conflict between the session setup that CloudScraper is doing and the session setup I'm doing manually by using cookie information.

I'm afraid I have a really, really busy schedule over the next few months; so I honestly can't say when I'll personally have the time to dig into this further, but I'd love it if somebody else could try out their own ideas toward finding a solution on this. At the moment, it might be literally March or April before I'll be able to put too much time toward this personally. I'll be following along with this thread, though, if others have questions or find anything interesting!

dbsqp commented 10 months ago

FYI my scripts that call this library have recently started working again. Is not stable though i.e. sometimes works, sometimes fails.

hannahburkhardt commented 9 months ago

Not sure if this is actually related or a coincidence, but a recent update changed my urllib3 version from urllib3-1.26.11 to urllib3-1.25.11 After forcing the version back with pip install urllib3==1.26.11 (and ignoring the compatibility warning), get_day started working again.

qmirioni commented 9 months ago

First thanks @coddingtonbear for this project I use it for months, really useful stuff. I gave your hint a try (about the cloudscraper) and it seems to fix the issue on my end.

Using the cloudscraper session directly didn't work but building it from the original request session seems to work. I made a very simple PR #172 if others want to test as well.

dbsqp commented 9 months ago

I pip installed cloudscraper and added the two changes in the PR to my local install and things appear to be working again. Thanks, I can now see how much I overate over the holidays!

coddingtonbear commented 9 months ago

Awesome; thanks for looking into it further, @qmirioni , and confirming that the fix works, @dbsqp -- I've also double-checked @qmirioni's solution and triple-checked things. That fix went out as part of 2.1.0; so I think things are now sorted.

coddingtonbear / python-myfitnesspal

client.get_date has stopped working #168