Feature: Add the HTML extractor

naydav commented 3 years ago

Need to add the HTML extractor feature for creating separate records for each p, li, td and code tag. Could be customized through the nodes_to_index option.

naydav commented 3 years ago

Hi @Haroenv,

According to the recommendation from https://github.com/algolia/gatsby-plugin-algolia/pull/134#issuecomment-844412917, it's should be a function that will be directly called in the transformer

But I think we can do it in a more reusable way. We can do it as part of the configuration.

Add one more parameter nodes_to_index: 'p,li,td,code' to https://github.com/algolia/gatsby-plugin-algolia/blob/master/gatsby-node.js#L42 (like it was with dry-run option)
Add logic for data transforming (will use algolia-html-extractor library) to https://github.com/algolia/gatsby-plugin-algolia/blob/master/gatsby-node.js#L421 Make the HTML extracting before calling custom transformers

3, It means that the gatsby-plugin-algolia library will get a dependency on html-extractor (https://github.com/mansona/html-extractor)

@Haroenv, please could you share your thoughts about what I describe above. Especially about proper place in the code for HTML extracting

Thanks in advance

Haroenv commented 3 years ago

Have you tried running the code manually in transformer? I’m not sure what’s missing with that approach. If it’s urgent you can put the code of the plug-in locally in a first instance (like in the example). Hope that makes sense, and I’m still interested in seeing which approach you take

naydav commented 3 years ago

@Haroenv Yes, the transformer approach works great, but in this case, you will have to duplicate the code each time (in each new transformer declaration).

The main idea was, if it is part of the main library, everyone can use it simply by using the configuration

But we're ok with the transformer. Please accept the resolution as you see fit, and we will begin to work. Thanks

Haroenv commented 3 years ago

As a starter solution the most interesting would be to share the code needed for this on your side, do you have a working example with split & stripped html?

algolia / gatsby-plugin-algolia

Feature: Add the HTML extractor #137