Chobbes / org-chef

A package for making a cookbook and managing recipes with org-mode.
MIT License
336 stars 33 forks source link

Add recipe json-ld extraction #50

Closed egh closed 4 years ago

egh commented 4 years ago

Many (most?) recipe sites include recipe data in the JSON-LD format (see https://developers.google.com/search/docs/data-types/recipe)

This adds a recipe extractor for JSON-LD format.

Many of the built in sites that org-chef supports also support JSON-LD (Fine Cooking, Serious Eats, Allrecipes, NYT, etc). I think that using JSON-LD would be easier to maintain, because it should be a more stable data format than parsing the html.

In order to test this extractor against the custom extractor, I defined a custom variable org-chef-prefer-json-ld. If it is t, org-chef will prefer to use the JSON-LD extractor. Otherwise, it is used as a last resort only.

This addresses #16 and #48 and (I think) #49

Chobbes commented 4 years ago

Thanks for looking into this! This looks great, I wasn't aware of this, and I'm excited to have this functionality added.

One thing that I'm noticing, though, is that this uses dom-search, which I don't seem to have in emacs 26.3. It seems like it's something that exists in emacs development branches? Is it possible to modify this to work for earlier emacs versions?

Thanks again :).

egh commented 4 years ago

@Chobbes Yes, I just learned about it. It's pretty amazing. All the recipe is right there, all structured :)

Thanks for the info on dom-search. Should be fixed in ec51cd11167da2d5fd4bf490fe0387e29f7a712e

Chobbes commented 4 years ago

Seems to be working now!

I do get a "Bad string format" error with the weber recipes, though.

https://www.weber.com/US/en/recipes/red-meat/the-ultimate-burger/weber-2008421.html

But we can figure this out later :). Thanks for submitting this!

egh commented 4 years ago

Thank you!