davesag / open-recipe

A facebook app that allows people to share their favourite home recipes
0 stars 0 forks source link

seed the database with common ingredients #7

Open davesag opened 12 years ago

davesag commented 12 years ago

The system should start life with a good collection of ingredients and allow a simple mechanism for users to add more in a consistent manner. We will need the ability to merge ingredients if two ingredients turn out to be the same thing.

Also note that many ingredients have multiple names for the same thing. Spanish Onions are the same as Red Onions.

I've kicked this off with the following small list for testing purposes.

dmontherun commented 12 years ago

Might be worth investigating GS1, the global administrator for barcodes.

See e.g. http://www.gs1au.org/services/gs1_trusted_data_services.asp (and fees for the data: http://www.gs1au.org/assets/documents/services/trusted-data/GS1-Trusted-Data-Service-Fee-Schedule.pdf).

GS1 themselves are putting the data to use in an iphone app: http://www.gs1au.org/services/goscan/gs1_goscan_iphone_application_overview.asp

(... as a side note, if you squint at the iphone screenshot you might note the symbol they've chosen for 'ingredients' within their app - looks suspiciously like a mortar and pestle?).

dmontherun commented 12 years ago

Dave, was thinking more on discussion yesterday.

In essence what I think we want to cater for is 3 lists: A. purchased ingredients, B. (core) ingredients, and C. prepared ingredients.

GS1 (previous comment) will give us an excellent start on A. Cookbooks will give us an excellent start on C. If we could get access to both (without the GS1 fees in the first instance, by one of the means discussed) we should be able to come up with a good list for B. with the objective that there are available and logical relationships for all items between A|B (many-one or many-many?) and B|C (many-many).

User free text (with autocomplete) would be fine for C. (with occasional admin cleanup for duplicates, spelling, etc?). Similarly, GS1 updated data plus user and/or supplier data could be accepted for A. The core list, B., would be the list we maintain for the app with tight QA and no user editing?

(My thinking also then strayed in to other lists to maintain, e.g. ingredients that go well together on flavour, substitute ingredients, etc but it's a second order issue .... and my boarding's just been called).

dmontherun commented 12 years ago

See also issue #16 (and #5).

Two possible sources for a seed list of (core) ingredients are FSANZ and the USDA. Both agencies collect data for nutritionally-focused public health and public safety purposes, and both have a list of "foods" which is freely available:

Both datasets would need significant refinement (simplification) for our purpose, but it's still probably a quicker way to start than going from scratch.

Lists to supplement specific areas might be found in wikipedia e.g. http://en.wikipedia.org/wiki/List_of_culinary_vegetables (although wikipedia is probably more useful for seed lists of recipe names - see http://en.wikipedia.org/wiki/Category:Lists_of_foods).

dmontherun commented 11 years ago

Email sent to GS1 (see comments above) regarding access to their Trusted Data Services (TDS) for evaluation/development purposes.


Attn: Sean Sloan

Sean,

We have been pointed to GS1's Trusted Data Services as of potential interest/use to us.

I am a Director of Open Recipe Pty Ltd. We are in development for the initial launch of our system early in 2013 (recipe sharing) and currently also evaluating and planning for subsequent development phases (including online grocery shopping, for example).

I have had a look on your website at the TDS information and fees. While the data services look promising and the fees seem quite reasonable as an ongoing operating expense, we would need to access data for evaluation and development purposes initially and the fees are prohibitive for us in that context.

Are you able to provide details of arrangements that GS1 may have to cater for restricted (non-commercial etc) access to data for evaluation/trial/development purposes? Alternatively, can we discuss?

Regards,


David Murchland +61 411 058 336 david.murchland@openrecipe.com.au Director Open Recipe Pty Ltd ABN 55160856799

dmontherun commented 11 years ago

Received email back from GS1 with a data sample and some information on services.

Reviewed with Dave then called Sean to discuss. Sean agreed to send GPC data hierarchy, xml format data sample, and (probably) export of product description for food and beverage to help us assess database coverage.

Learned from review and discussion:

davesag commented 11 years ago

GS1 provide standard product classification documents in Microsoft DOCX format. These files can be parsed in Ruby using the DOCX Gem.

I have placed a copy of the latest raw food and beverage files in the Dropbox.

dmontherun commented 11 years ago

GS1 has now provided an export of their Trade Item Description and Brand fields, for Supplier data, in the Food and Beverage categories (see dropbox).

Propose to use this GS1 dataset, combined with FSANZ/USDA public datasets (see comments above) and any other readily available sources to generate a first draft of the "core" internal managed dataset for the OR system.