Closed bsubedi26 closed 5 years ago
Hello and thanks for the comments,
I think there is some confusion about what the plugin does in regard to objectIDs
, so let me clarify that first. The plugin does not create one record for each page, it actually creates several records per page (default is one record per paragraph of text). You will have much more records in your Algolia index than you have pages (I did some estimates here).
In addition to that, the plugin also goes one step further than our API clients. The objectID
of each record is actually a md5 hash of the record content. This means that whenever you fix a typo in a paragraph, the objectID
for that record will change. The indexing done by the plugin takes that into account, and instead of pushing all records to the Algolia index, it actually first do a diff
between the about-to-be-pushed records and the one already in the remote index. It will delete old records and add new ones, but will keep untouched the records that are the same on both sides. I did that to dramatically cut down on the number of operations required.
So, to get back to your initial problem, I think you will be ok with the default behavior. Whenever you import your Drupal content into your Jekyll website, it should only update a few pages, and running jekyll algolia
will in turn only update the records that changed.
You shouldn't have any duplicated content, but if you do, could you post a link to a repo where I could reproduce the issue?
@pixelastic Thanks for the feedback. jekyll algolia
with its default behavior is working great right now and there aren't any duplicate content.
Closing this issue!
Hello, is it possible to specify the
objectID
value in the front matter header for the markdown files being published to algolia instead of automatically generating an UUID as theobjectID
?One reason as stated in the algolia (https://www.algolia.com/doc/guides/indexing/structuring-your-data/#unique-identifier---objectid):
If you don’t provide an objectID, Algolia will generate one automatically. However, it will be easier to remove or update records if you have stored a unique identifier in the objectID attribute.
Another reason for this is we're using
jekyll-import
(https://github.com/jekyll/jekyll-import) to import articles from drupal to jekyll, then publishing the jekyll markdown files to algolia using this plugin. While we're working on removing the drupal system, we need a way to keep the old articles in sync with the new system - its happening this way:drupal -> jekyll -> algolia
. One issue is, if an article was updated in drupal, and imported to jekyll and then synced to algolia. Then, there would be duplicates for that same article since theobjectID
value are different.What is the current behavior?
The current behavior is the algolia plugin generated an UUID for a record automatically. https://github.com/algolia/jekyll-algolia/blob/a5bf8f6089d9cfb0e133b7f071f172e877ba3254/lib/jekyll/algolia/extractor.rb#L47
What is your expected behavior?
If the
objectID
field is specified in the front matter header, this plugin ignore that value and generates an uuid for that record. Is there a way to use the id value specified in the front matter header instead of generating an UUID if a markdown file has anobjectID
field value?Can it work similar to the npm package
algoliasearch
? Withalgoliasearch
, if you push data to algolia (usingaddObjects()
method) it will use theobjectID
field in the data instead of generating a random UUID.