davidhealey / waistline

Libre calorie counter app for Android. Built with Cordova.
561 stars 66 forks source link

Adding a scraper/extrenal DB as data source #793

Open pascal-mueller opened 7 months ago

pascal-mueller commented 7 months ago

Hello,

I'd love to track my food but I don't want to use generic databases. I mostly care about macros anyway. What I'd like to have is an empty database that I can fill by pressing a button "Add food" and I can choose the source: A) Some generic database B) Choose a homepage like e.g. www.walmart.com and then it opens the homepage, I can search "lentils" and just import it and it scrapers the macros and adds it as a new product.

I can write the scraper myself but the other things might be a bit tricky, would have to learn electron first.

Another nice way would be to just use my own database that's automatically synced. So I can just provide the API endpoint to my database (whatever that is, I don't really care) and it syncs it.

I'm also fine if I can just hook up my own data source and just search like I would anyway. I can just prescrape everything.

Is anything like that easily implemented, if yes which one would it be?

davidhealey commented 7 months ago

easily implemented

I don't think so. I'm used to working with APIs, how does a scraper work?

pascal-mueller commented 7 months ago

What I mean is: I search for "lentils" and instead of searching the currently integrated database, I can send that request to the homepage of a store like Walmart or whatever. I then can search the product on there. I might end up on this homepage: https://www.walmart.com/ip/Great-Value-Lentils-1-lb/545884744?athbdg=L1200&from=/search the "scraper" then collects the image, the name as well as the nutritional information. This collection of information is different for any "shop homepage source" you want to use.

I think an easier way would be that I just scraper all the products from the two shops I usually frequent, create a database and just fork the project and hook it up. Should be easy if I just use the same API interface no?

davidhealey commented 7 months ago

Should be easy if I just use the same API interface no?

If you can make a tool that pulls nutritional data from websites in that manner I would really encourage you to make it a separate tool and to upload that data to open food facts. This will benefit many projects, including Waistline. I'm not sure you'd be able to upload the images due to copyright - but you could always ask Walmart for permission.

One thing that might be possible to add to Waistline is a shop filter, so you will only see results from Walmart for example.

pascal-mueller commented 7 months ago

I think what I wanna do here can be generalized to simply adding the possibility to A) disable data sources and B) add your own via a REST API (or whatever is currently used). That way I could build up my own one and just use it.

But it's probably too niche of a feature request, I might check if I can just fork it and replace it myself. Do you think it's easy to hook up my own API if I just mirror the interface of open food facts?

If I have all that data, I'm happy to upload it to open food facts - would have to figure out how to make sure there are no duplicates I assume but that should be solvable somehow.

davidhealey commented 7 months ago

Yes if you mirror OFF it should just be a case of swapping out the end points.

would have to figure out how to make sure there are no duplicates I assume but that should be solvable somehow

You can check for duplicates using the barcode.

pascal-mueller commented 7 months ago

Yep, I also thought about the bar code but not sure if that's readable form those online pictures. The shops I'll scrape do show it on their pictures. Will have to see.

Thanks, I'll try and scrape the data

davidhealey commented 7 months ago

Ah yeah, if you don't have the barcode you can't upload it to OFF. I was thinking the barcode would be published on the sites alongside other data but I think you're correct that it's not.

pascal-mueller commented 7 months ago

Yeah but I can hopefully just scan it from the picture. We'll see.

EmilJunker commented 7 months ago

@pascal-mueller I just wanted to let you know that it is also possible to import a custom curated list of food items into Waistline from a JSON file. So you could use your web scraper to make a JSON list of all the Walmart products and then import them into Waistline under a certain category (You can create categories under Settings > Foods, Meals, Recipes > Labels and Categories). Then you would be able to search through the Walmart items in your local database almost like any other item, and even without an internet connection.

The steps to import food items from JSON are described here: https://github.com/davidhealey/waistline/blob/master/FAQ.md#do-you-plan-to-add-support-for-the-xxx-online-food-database-or-api-available-in-my-country