davidhealey / waistline

Libre calorie counter app for Android. Built with Cordova.
561 stars 66 forks source link

[Feature Request] Local Copy of the Open Food Facts (OFF) DB #604

Open simonj222 opened 2 years ago

simonj222 commented 2 years ago

I suspect one feature that is holding back adoption of Waistline is the search functionality. The OFF Search API has a few limitations:

We could solve these concerns through an offline version of the DB.

My main concern with doing this is the increase in APK size. I've written a simple Python script that takes the OFF csv, and populates a smaller csv with only the necessary fields (product name, serving size, calories - only what's displayed in the search results view). It does this only for items that contain calories + serving size, resulting in ~500k items. This compressed is ~10MB, but will grow over time as the DB grows.

Detailed nutritional information could be populated through a separate call if the item detail view is displayed, or the item is added to the diary.

There's some downsides:

However, the improved search functionality seems worth those downsides.

If there's sufficient interest here, I can try putting together a PR when I can find some spare time, (but couldn't commit to it right now).

The work here is non-trivial and would appear to involve:

davidhealey commented 2 years ago

What's TF-IDF?

simonj222 commented 2 years ago

What's TF-IDF?

Sorry, Term Frequency - Inverse Document Frequency, it's probably a good starting point for ranking in a situation like this. It'll give us a score for each item matching a search query, and incorporate a notion of how valuable each search term is (if multiple search terms are provided).

This may not play so nicely with typeahead, but is probably a good initial direction to investigate.

davidhealey commented 2 years ago

I don't like the idea of bloating the APK. I'd rather any offline DB was part of a separate download that the user can opt into. Apart from that it all sounds like a good idea to me.

simonj222 commented 2 years ago

Got it - I'd avoided that because then we'd need to host it.

However, perhaps the OFF data manipulation + output could live in a separate Github project. That way Github hosts the output, and the app could download from that hosted url.

I'll try to play around with this when I get a chance.

EmilJunker commented 2 years ago

To me, the biggest reason why the OFF search in Waistline is so bad and I rarely use it is because of its data quality and quantity issues. You only find what you're looking for if (a) the item exists in the OFF database at all, and (b) the product name or brand in OFF actually matches the one printed on the item (i.e. the search term you enter). All too often, these are not a given. That's why I always prefer to search for products via the barcode, and only resort to text based search if it's absolutely necessary.

I really like your idea, but I don't think it would fix the underlying problem of the OFF database. Sorry for being so pessimistic, but I'm afraid this whole thing will just turn out to be a huge amount of work, but in the end lead to no substantial improvement in the user experience.

EmilJunker commented 2 years ago

By the way, it looks like the OFF search API has a sort_by parameter that allows to sort the results by popularity (most frequently scanned items first). Maybe this would be worth a look.

davidhealey commented 2 years ago

I use the search all the time, I don't think it's that bad. Sometimes the data isn't quite right but that's the same when you scan a barcode.

I agree that an offline database isn't going to make a huge difference overall, but if someone else wants to do it and it doesn't negatively impact my use of the app then I'm happy to include it.

We're already using sort_by https://github.com/davidhealey/waistline/blob/master/www/activities/foodlist/js/open-food-facts.js#L29

EmilJunker commented 2 years ago

if someone else wants to do it and it doesn't negatively impact my use of the app then I'm happy to include it

Sure, if someone else wants to implement it and it's an optional download, then it's fine. I'm not advocating against this being added. But I do think that it would require a lot of effort, and the result might not actually be too different from what we have now, so you have been warned ;)

Also, there are a few problems with the approach outlined in the original comment:

We're already using sort_by

Oh, that's interesting. But it's currently set to last_modified_t. I think it would be worth experimenting with other values such as unique_scans_n and see if that improves the search output.

davidhealey commented 2 years ago
  • The typeahead feature could interfere with searching for local food items. When I'm just typing in the search field to look for an item from the local foodslist, it could be annoying to be presented with search suggestions from OFF.

Yes I think I would want an option to disable this feature, although it could also be used to typeahead in your local DB as well as the local OFF DB.

I think it would be worth experimenting with other values such as unique_scans_n and see if that improves the search output.

Yeah we can play around with it.

simonj222 commented 2 years ago

I'm afraid this whole thing will just turn out to be a huge amount of work, but in the end lead to no substantial improvement in the user experience.

That's a very valid concern, and I agree with the risk. I built an offline search functionality, and was able to see a real improvement for my queries. However, the technical complexity may not be worth it. I'm treating this as an experiment with a high chance of failure :)

we also need the product brand (not just to display it, but also for searching), and the product image (for the thumbnail) ... There are some food items that have no calories, but should still be included in the search results, e.g. dietary supplements

Great points - given this would be an optional download, size becomes less of a concern. I'm investigating an alternative approach - having a much larger file that contains everything needed for the searching + item detail view (ie, all nutritional information). This avoids the complexity of a second fetch.

The data grows to ~160MB (after stripping out unneeded fields + using parquet + gzip'ing). Large, but probably acceptable for people who want this functionality. I'll continue playing around with this and see how the integration would look.

Oh, that's interesting. But it's currently set to last_modified_t. I think it would be worth experimenting with other values such as unique_scans_n and see if that improves the search output.

+1, I think that's a great idea.

simonj222 commented 2 years ago

Quick update here - I've been playing around with getting a local index, but Cordova doesn't make it easy. Since we want to do something smart, IndexedDB isn't really sufficient for our needs.

I'm instead looking into creating a new OFF API that would support this usecase. It would also hopefully help other developers. I'll circle back here if I have any luck. However, for the moment I'll close this issue.

teolemon commented 2 years ago

ahaha :-) @simonj222 I'm now connecting the dots :-) I was retesting the latest version of Waistline, given the current MyFitnessPal apocalypse, and I found this issue :-P

simonj2 commented 2 years ago

@teolemon - ahaha, small world! :)

For anyone else following along - the new API is a WIP at: https://github.com/openfoodfacts/openfoodfacts-search

davidhealey commented 2 years ago

MyFitnessPal apocalypse

@teolemon Tell me more

teolemon commented 2 years ago

https://www.theverge.com/2022/8/25/23321408/myfitnesspal-weight-loss-app-barcode-scanning-premium-paywall

teolemon commented 2 years ago

People are pissed, cf Twitter

davidhealey commented 2 years ago

That's a crazy move, I expect they'll make a u-turn on it. Hopefully more users will move to free alternatives like Waistline.

jncosideout commented 1 year ago

I will certainly tell my friends about the My Fitness Pal Apocalypse and use that to pitch Waistline to them 😄 @teolemon

Kallinteris-Andreas commented 6 months ago

Hey, this appears to be not actively developed, but I would like to add (as a user):

teolemon commented 6 months ago

FYI, the new search API is now live at https://search.openfoodfacts.org/docs

davidhealey commented 6 months ago

FYI, the new search API is now live at https://search.openfoodfacts.org/docs

Is this a breaking change for existing apps?

teolemon commented 6 months ago

The 2 existing API will co-exist, but it's recommended to update