Closed ColorfulQuark closed 9 months ago
debugging show a 403-Forbidden on the /food/diary. Exercise, water and note still work.
It is not an Authentication Failure, seems to be a server side restriction.
myfitnesspal --loglevel DEBUG day 2023-09-15
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): www.myfitnesspal.com:443
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /user/auth_token?refresh=true HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): api.myfitnesspal.com:443
DEBUG:urllib3.connectionpool:https://api.myfitnesspal.com:443 "GET /v2/users/xxxxxxxxxx?fields%5B%5D=diary_preferences&fields%5B%5D=goal_preferences&fields%5B%5D=unit_preferences&fields%5B%5D=paid_subscriptions&fields%5B%5D=account&fields%5B%5D=goal_displays&fields%5B%5D=location_preferences&fields%5B%5D=system_data&fields%5B%5D=profiles&fields%5B%5D=step_sources HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /food/diary/xxxxxxxxxx?date=2023-09-15 HTTP/1.1" 403 None
2023-09-15
Meals
Totals
DEBUG:urllib3.connectionpool:Resetting dropped connection: www.myfitnesspal.com
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /food/water?date=2023-09-15 HTTP/1.1" 200 None
Water: 3000.0
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /food/note?date=2023-09-15 HTTP/1.1" 200 None
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /food/note?date=2023-09-15 HTTP/1.1" 200 None
Test Note
Exercises
DEBUG:urllib3.connectionpool:https://www.myfitnesspal.com:443 "GET /exercise/diary/xxxxxxxxxx?date=2023-09-15 HTTP/1.1" 200 None
Cardiovascular
Strength Training
When I download https://myfitnesspal.com/food/diary/xxxxxx
using client._get_content_for_url(url)
, it includes <span id="challenge-error-text">Enable JavaScript and cookies to continue
As you note, other metrics work.
Seem the webpage rendered from js.
Yes, the question is what to do about js. Cookies should be covered by browser_cookie3.
This is probably good news, actually -- if the page is being rendered in JS, it means that there may be APIs that we can use for getting direct access to the data now instead of needing to scrape the page.
It'll be a while before I can find the time to look into this, unforch, though!
I don't think they have an API for the community, only for business-to-business.
I had a quick look at the js, it executes function(){window._cf_chl_opt={ ... ))
which seems to be a component of Cloudflare Bot Management.
I'm aware of the official API situation -- I'm the person who wrote this library. But if you're rendering data in a UI in Javascript, that data has to be coming from somewhere, and those interfaces are still an API, even if they're an unofficial one.
I've finally had a little bit of time to have a look at this, and at the moment I'm not sure I can figure this out in the time I have available, but I can at least post a few things I learned here so folks can use them as a jumping-off point toward finding a solution.
I have a branch with some changes that add additional logging around requests that are made and what their responses are in 168__diary_entries_cloudflare
. You can use it to run a simple request from the command-line using the --log-requests-to
command-line argument and sending a request like:
myfitnesspal --log-requests-to='./temp' day 2023-12-03
Inside the .temp/
folder you'll find sequentially-named .json
documents with information about the outgoing and incoming request.
Things I've learned from this include:
https://www.myfitnesspal.com/food/diary/USERNAME?date=2023-12-03
). You can tell it's a problem because the returned status code is a 403.cloudscraper.CloudScraper
instance (see https://github.com/VeNoMouS/cloudscraper) but that does not appear to solve the problem, and instead raises its own by causing the 002 request to fail with a 500 error. I have not investigated why that's happening in any depth, though, and it might be that there's some kind of conflict between the session setup that CloudScraper is doing and the session setup I'm doing manually by using cookie information.I'm afraid I have a really, really busy schedule over the next few months; so I honestly can't say when I'll personally have the time to dig into this further, but I'd love it if somebody else could try out their own ideas toward finding a solution on this. At the moment, it might be literally March or April before I'll be able to put too much time toward this personally. I'll be following along with this thread, though, if others have questions or find anything interesting!
FYI my scripts that call this library have recently started working again. Is not stable though i.e. sometimes works, sometimes fails.
Not sure if this is actually related or a coincidence, but a recent update changed my urllib3 version from urllib3-1.26.11 to urllib3-1.25.11
After forcing the version back with pip install urllib3==1.26.11
(and ignoring the compatibility warning), get_day started working again.
First thanks @coddingtonbear for this project I use it for months, really useful stuff. I gave your hint a try (about the cloudscraper) and it seems to fix the issue on my end.
Using the cloudscraper session directly didn't work but building it from the original request session seems to work. I made a very simple PR #172 if others want to test as well.
I pip installed cloudscraper and added the two changes in the PR to my local install and things appear to be working again. Thanks, I can now see how much I overate over the holidays!
Awesome; thanks for looking into it further, @qmirioni , and confirming that the fix works, @dbsqp -- I've also double-checked @qmirioni's solution and triple-checked things. That fix went out as part of 2.1.0; so I think things are now sorted.
day = client.get_date(y, m, d)
is just returning an empty dict rather than macro and calorie info. This started a day or two ago.https://python-myfitnesspal.readthedocs.io/en/latest/how_to/diary.html