DanRoscigno / Recipes

0 stars 0 forks source link

Consider replacing the Docsearch Docker crawler #20

Open DanRoscigno opened 2 weeks ago

DanRoscigno commented 2 weeks ago

This is one record from my Algolia index:

{
  "anchor": null,
  "content": "Zucchini Bread Zucchini Bread \n Gifts   Breads   Summer \n Ingredients \n 2 cups shredded zucchini with skins on \n 2 cups sugar \n 1 cup vegetable oil \n 3 eggs \n 1 tsp vanilla extract \n 3 cups all purpose flour \n 1/4 tsp baking powder \n 1 tsp baking soda \n 1/2 tsp salt \n 1 tsp ground cloves \n 1 tsp ginger \n 1 tsp cinnamon \n 1 cup chopped walnuts \n Mix zucchini, sugar and add oil. Add eggs one at a time and mix well. Add vanilla. Mix dry ingredients together and add to wet ingredients. Add nuts. Pour into 2 greased loaf pans. Bake 325 degrees for 1 hour. \n Servings \n 2 loaves Edit this page",
  "hierarchy": {
    "lvl0": "Zucchini Bread",
    "lvl1": null,
    "lvl2": null,
    "lvl3": null,
    "lvl4": null,
    "lvl5": null,
    "lvl6": null
  },
  "objectID": "ffea186be276eb3405b86e54970fbd4eddf918bd",
  "type": "content",
  "url": "https://danroscigno.github.io/Recipes/Zucchini_Bread/",
  "url_without_anchor": "https://danroscigno.github.io/Recipes/Zucchini_Bread/"
}

This was exported with the Algolia CLI:

# recipes is the name of the index
algolia objects browse recipes

I am thinking about generating the index with a scraper other than the old Docsearch Docker crawler and publishing using an Algolia API. Something like this using Colly and the Golang API client.

Good blog: https://benjamincongdon.me/blog/2018/03/01/Scraping-the-Web-in-Golang-with-Colly-and-Goquery/