Open JakeR-26 opened 4 months ago
Running in to this issue as well. When using get_date the HTML appears to be different for the URL than when I visit it manually...
Alright dug a little deeper in the problem. And it appears that the url fetched by the cloudscraper function is a blank page asking you to enable javascript and cookies. To solve this issue I updated some of the functions in this library to use a Selenium headless driver instead, when requests made to the get_date endpoint. This enabled me to do so. Before getting the particular page, I send a request to the myfitnesspal.com with the obtained cookies. This got me to a captcha page.
The cookies I am setting are just from using the load() function from browser_cookie3 on the domainname. However, the mfp function to this with cloudscraper uses a 'cookiejar'. If this is none, it also just uses the load function from browser_cookie3. I did not add that because the 'cookiejar' always appears to be empty...
I don't yet fully understand how to use that 'cookiejar' this is the only loose end I currently got. Might pick this up again later.
@ANNwind so, did the modifications fix the problem for you? Can you correctly retrieve the data?
Hi @uccollab,
TL;DR No, I am getting a captcha page.
I see now that the last paragraph of my comment is quite vague. The end result is that I get to a captha page. I suspect this will not be the case if the cookies are correctly set. Since this behaviour does not occur if you go the pages manually.
In the last paragraph of my previous comment I tried to explain that I indeed see that there are cookies loaded with the browser_cookie3 library. Unless the 'cookiejar' variable is not none. In that case it uses a similar function from the Cloudscraper library.
I suspect we need to update the 'cookiejar' variable with the right cookies in order for us to circumvent the captcha page. But I don't yet fully understand how this all works.
I could also be totally wrong, and the website just detects botbehavior regardless of cookies and shows a captcha page because of this.....
I am a python novice so I may just be making a mistake, but I have been trying to get my data from myfitnesspal and started with the standard howto for a day's data. The code I have is as follows:
When I run this, I get
05/21/24 {}
as a result. I have checked the files created by the log and it appears that no nutritional data is present in the 3 log files generated by the requests. I am, however, able to access my water intake withprint(day.water)
which returns the correct value. I have visited the url that is generated in the request and all of my data is on that page, and my profile is set to public so it is not being hidden from the request (checked in incognito while not signed in, and on another browser). Is this something I am doing wrong or is it a change to the UI on the site that is not working?