coddingtonbear / python-myfitnesspal

Access your meal tracking data stored in MyFitnessPal programatically
MIT License
789 stars 138 forks source link

client.get_measurements problem #151

Open ColorfulQuark opened 1 year ago

ColorfulQuark commented 1 year ago

Late yesterday (18 Jan) client.get_measurements stopped working for me. Logging in with client = myfitnesspal.Client() and client.get_date continue to work.

I did notice the design of the https://www.myfitnesspal.com/measurements/check-in page changed, so perhaps that's related.

TimOgden commented 1 year ago

Same thing happening to me, no measurements are found on that page, looks like the webscraper for this part will have to be redone, seems like they took out any easily identifiable id's and I don't have experience parsing XML's so hopefully someone can find a fix for this

ColorfulQuark commented 1 year ago

What's the function to get a mfp page? Pending a fix, I'd like to retrieve https://www.myfitnesspal.com/measurements/check-in which contains the data I most need and scrape it myself. I seem to be missing something obvious.

TimOgden commented 1 year ago

@ColorfulQuark You can see the process in Client.get_measurements() in client.py line 528. self._get_url_for_measurements() returns 'https://www.myfitnesspal.com/measurements/edit?page=1&type=1' which I believe needs to be changed.

Then in line 531, we call self._get_measurement_ids(document) on the document we loaded and do some XML scraping to find the measurements on the page. This XML scraping is also broken because it relies on id attribute matching which seems like doesn't exist in the new page.

It'd be great if you or someone could figure out the XML scraping. I tried for like an hour but with the lack of ids, it's really hard for me to find the information I'm looking for, especially because I've never worked with XPath

ColorfulQuark commented 1 year ago

@TimOgden I was looking for the function that will return the contents of a page given the URL, Something like requests.get(url), but that transmits cookies and whatever else might be needed for authentication. There are a number of likely looking function names in client.py, but I can't figure out how to get the contents of a page.

https://www.myfitnesspal.com/measurements/edit?page=1&type=1 contains weight information, so it should be possible to extract that information. You can also get that page from https://www.myfitnesspal.com/measurements/edit?type=Weight&page=1 and other measurements by substituting the measurement you're looking for in the URL, e.g., type=Neck.

The data is a list of dicts: [{"id":"12345678901234","date":"2023-01-20","unit":"pounds","type":"Weight","updated_at":"2023-01-20T13:23:38Z","value":123}, ...] If nothing else, it should be possible to fetch the list and parse the dicts, presuming I can figure out how to download the contents of the relevant pages. Alas, this will only get recent entries, so we need to figure out how to get the pages with older entries.

TimOgden commented 1 year ago

Can't you use client._get_document_for_url(url)? Not sure where you found that list of dicts but that seems perfect! As for non-recent entries, seems like we have to iterate the page number in the url until we find that the table says "No measurements found".

TimOgden commented 1 year ago

Oh I just found them, good catch, that should be perfect

ColorfulQuark commented 1 year ago
'def _get_document_for_url(self, url):
        content = self._get_content_for_url(url)

        return lxml.html.document_fromstring(content)

That parses xml I had thought content = self._get_content_for_url(url) would do it, but for some reason it doesn't return the page I see when logged in.

TimOgden commented 1 year ago

Weird, it seems like I get the page and am logged in just fine using self._get_content_for_url(url). I can write a parser but I probably would have to use beautifulsoup and it probably won't be until after this weekend, so up to you if you want to try to figure out the issue you're facing with that, maybe try clearing your cookies on chrome, restarting chrome, restarting the python script, etc

ColorfulQuark commented 1 year ago

EDIT: this is now working:

import datetime
import json
import re

import myfitnesspal

client = myfitnesspal.Client()
day = client.get_date(datetime.date.today())
print(day)

url = "https://www.myfitnesspal.com/measurements/edit?type=Weight&page=1"
data = client._get_content_for_url(url) 
print(len(data))

if res := re.search(r'\[\\"idm-user-with-consents\\"]"},{"state":{"data":{"items":(.*?)]', data):
    for item in json.loads(res[1]+']'):
        print(item['date'], item['value'])
else:
    print('oops')
ColorfulQuark commented 1 year ago
import datetime
import json
import re
from itertools import count

import myfitnesspal

def get_day(client):
    day = client.get_date(datetime.date.today())
    print(day)

def get_measures(client, id, lower_date):
    data = {}
    stop = False
    for page_num in count(1, 1):
        url = f"https://www.myfitnesspal.com/measurements/edit?type={id}&page={page_num}"
        page = client._get_content_for_url(url) 

        if res := re.search(r'\[\\"idm-user-with-consents\\"]"},{"state":{"data":{"items":(.*?)]', page):
            for item in json.loads(res[1]+']'):
                if item['date'] < lower_date:
                    stop = True
                    break
                data[item['date']] = item['value']
        else:
            print('oops', len(page))
        if stop or re.search('"has_more":(.*?),', page)[1] == 'false':
            break

    return data

def latest_measures():    
    url ="https://www.myfitnesspal.com/measurements/check-in"
    page = client._get_content_for_url(url)
    res = re.search(r'{"mutations":\[\],"queries":\[{"state":{"data":{"items":(.*?)]', page)
    data = {}
    for item in json.loads(res[1]+']'):
        data[item['type']] = item['value']
    return data

client = myfitnesspal.Client()

data = latest_measures()
print(data)

print(data.keys()) # measurement ids

data = get_measures(client, 'Weight', '2023-01-02')
for dt, item in data.items():        
    print(dt, item)
TimOgden commented 1 year ago

Sorry @ColorfulQuark, I was gone for the weekend. I just ran your script and it seems like it works perfect and also grabs the whole dataset instead of just the first page. I can integrate this into the actual code and make a PR so it will be fixed for everyone.

ColorfulQuark commented 1 year ago

@TimOgden Sounds good. Glad you like it. With luck it will fit in with just a bit of tweaking to just get data between two dates (rather than my everything back to a specified date), add annotations, etc. I don't think the mainline has a latest_measures function, but I find it useful.