doctrine / doctrine-website

Source code for the doctrine-project.org website and documentation.
https://www.doctrine-project.org
MIT License
30 stars 63 forks source link

Create cronjob workflow to index website docs in Algolia #417

Open SenseException opened 3 years ago

SenseException commented 3 years ago

There is currently no automated build step to index the docs in Algolia for the website search bar. This has the following reasons:

To be able to update the docs regularly and keep the search and its results up-to-date a workflow should be created that builds the indexes at a time before the monthly Algolia limit gets a reset. This way it should be possible to prioritize the users of the search and keep the search availability. Because projects like ORM and DBAL are more frequented than e.g. Annotations, we can also plan different runs for every project in Doctrine to spare Algolia requests.

greg0ire commented 3 years ago

After reading the code, it seems to me that we only do 1 call to addObjects per project… how low is that limit? One call to that method will only translate into several requests if there are more than 1000 objects (assuming we are using the default batch size: https://github.com/algolia/algoliasearch-client-php/blob/1c9440d8151cc4c9363128145b898946baffcd42/src/Config/SearchConfig.php#L31)

morozov commented 2 years ago

Given that all the website contents are versioned in Git, instead of building the search index via a cron job, would it make sense to build it based on the diff between the previous and the new website version?

SenseException commented 2 years ago

I haven't taken a look into the search index itself but not every change in the docs would affect the search index. One of my first thoughts was about building the index when a change can be found with a diff but there are usually not that many changes which is why I thought about cronjobs as a first step.

The website code is currently flawed when it comes to indexing for a certain project and version. It currently always deletes the whole index. This needs to be handled first before projects can be reindexed separately.