Closed clnt closed 2 years ago
Hi all,
Initial v3 Algolia search index built using a custom docsearch configuration as the public config has not been updated to reflect v3 (not sure if they are now using a different implementation of Algolia on the main site), needs more modifications to fix some missing entries and errors currently being generated by the scraper. I have posted my current docsearch config below which is very much a modified version of the v2. (Current v3 index has ~470 entries unlike the 10k entries in the v2 index).
Main differences I have seen is for example no longer having a #content-wrapper
which resulted in me targeting the .max-w-3xl
class.
{
"index_name": "v3_tailwindcss",
"start_urls": [
"https://tailwindcss.com"
],
"stop_urls": [],
"selectors": {
"default": {
"lvl0": {
"selector": "//nav[contains(@id, 'nav')]//li//a[contains(@class, 'text-sky-500')]/preceding::h5[1]",
"type": "xpath",
"global": true,
"default_value": "Documentation"
},
"lvl1": "#header h1",
"lvl2": "#content h2",
"lvl3": "#content h3.group",
"lvl4": ".max-w-3xl, td:first-child, .text-violet-600",
"lvl5": ".max-w-3xl h5",
"text": "#header > h1, #content h2, #content h3.group, .align-baseline td:not(:first-child)"
}
},
"selectors_exclude": [
"p.text-base",
".list-inside",
".bg-gray-200",
"[data-docsearch-ignore]"
],
"custom_settings": {
"attributesForFaceting": [
"version",
"type",
"tags"
],
"attributesToRetrieve": [
"hierarchy",
"content",
"anchor",
"url",
"url_without_anchor",
"type"
],
"separatorsToIndex": "@_"
},
"js_render": true,
"conversation_id": [
"459164857"
],
"nb_hits": 470
}
Here is a link to the current public DocSearch config which will show the modifications that have been made in the above config. https://github.com/algolia/docsearch-configs/blob/master/configs/tailwindcss.json (one thing to note is I have a separate index config for each version rather than the 3 combined in the link above)
I feel although its not perfect right now, I think most of the issues are in the index/config and that I won't need to make any further changes to the workflow itself hopefully so I would like to get a v3 release out and work on fixing the index.
New release exported and tagged version 3.0.0: https://github.com/clnt/alfred-tailwindcss-docs/releases/tag/v3.0.0
Update:
Apologies for the delay in getting the index sorted, there is not a public configuration available for v3 at the moment as Algolia now use their DocSearch Crawler infrastructure rather than the now deprecated self-hosted DocSearch Scraper which I am using to generate the indexes for this. TailwindCSS switched over to using the crawler for v3.
I am currently in contact with Algolia to seek a solution to this issue.
I will update as soon as I know more, thanks for your patience.
Will have a working update tonight, plan for now is to use the same credentials the Tailwind site is using however to support this I needed to make some changes to the code so I will need to release a new version of the workflow.
Based upon the information Algolia have given me I will keep trying to build my own index but for now will use Tailwind's credentials for v3 until Algolia make the new configs public (however that is currently not a priority for them).
New release now available: https://github.com/clnt/alfred-tailwindcss-docs/releases/tag/v3.0.1 :)
New version required to support TailwindCSS v3 and v2, keep older compatibility for now. v0 compatibility to be dropped on next release.
New commands will be:
tw
- TailwindCSS v3tw2
- TailwindCSS v2tw1
- Tailwind CSS v1tw0
- Tailwind CSS v0Will try and get an update done asap, currently the public docsearch config for TailwindCSS has not been updated to add a v3 (only the domain change to fix v2).