MDAnalysis / MDAnalysis.github.io

MDAnalysis home page mdanalysis.org as GitHub pages.
https://mdanalysis.org
14 stars 39 forks source link

search does not find results for distopia (and others...) #202

Open orbeckst opened 3 years ago

orbeckst commented 3 years ago

I searched for "distopia" and "CalcBondsOrtho" and no hits (even though its sitemap has been added in #201 (Issue #200) .

Probably similar issues as #199 but not clear yet why the algolia search does not seem to index these parts of the site. Perhaps the config options need to be adjusted.

EDIT: While investigating the issue it is becoming clear that other docs are also not being indexed or the index is not being updated. I am making this issue a catch all.

orbeckst commented 3 years ago

I fixed a missing tag in eb82c82 but I doubt that this will fix everything... let's give it 24h.

hmacdope commented 3 years ago

Thanks @orbeckst! I am not much help here ...

orbeckst commented 3 years ago

I raised https://github.com/algolia/docsearch-configs/issues/4699, perhaps we can get some help from the algolia folks.

I also created PR https://github.com/algolia/docsearch-configs/issues/4700 to include code samples (pre tag) and definition items in definition lists (dt tags) in the index.

orbeckst commented 3 years ago

I installed the docsearch-scraper locally and I’m able to run it so I can now debug more easily.

orbeckst commented 3 years ago

Well... maybe not that simple:

$ ./docsearch run ../docsearch-configs/configs/mdanalysis.json
...
algoliasearch.exceptions.RequestException: Record quota exceeded. Change plan or delete records.

Nb hits: 10415
previous nb_hits: 85975

Will need to see how to work within these limitations.

hmacdope commented 3 years ago

I don’t know much about algolia but could a lack of quota be why it’s not being indexed?

On Sat, 9 Oct 2021 at 2:05 pm, Oliver Beckstein @.***> wrote:

Well... maybe not that simple:

$ ./docsearch run ../docsearch-configs/configs/mdanalysis.json ... algoliasearch.exceptions.RequestException: Record quota exceeded. Change plan or delete records.

Nb hits: 10415 previous nb_hits: 85975

Will need to see how to work within these limitations.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MDAnalysis/MDAnalysis.github.io/issues/202#issuecomment-939213146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AF3RHC6P76S6BRLSBOZHBUDUF6WORANCNFSM5E27VQHA .

-- Hugo MacDermott-Opeskin PhD Candidate, RSC ANU Email: @. @.>

orbeckst commented 3 years ago

I don't think this applies when you're running through their infrastructure as an approved open source documentation site.

What I am trying is to use their "commercial" analytics infrastructure on the free plan with our own scraped index. I am really only interested in running the scraper and seeing how changing the config file changes what it picks up. If I find some time I'll just try to disable the sending of results and retain the local index building.

orbeckst commented 3 years ago

To run docsearch without commiting changes to the actual index, see https://github.com/orbeckst/docsearch-scraper/pull/1 for the required code change.

orbeckst commented 3 years ago

Sitemaps for multiple projects are now broken because a release string was inserted into the URL even though this does not reflect the deployment URL. My suspicion is that something changed in the sphinx_sitemap plugin or in our Sphinx configuration.

It seems unlikely that this is due to the GH actions workflow because PMDA (which has been using Travis and not actions) also hast the same problem.

orbeckst commented 3 years ago

Looking at the docs for the sitemap plugin, the solution appears to be set in conf.py

sitemap_url_scheme = "{link}"

so that the version is not included.

orbeckst commented 2 years ago

New PR https://github.com/algolia/docsearch-configs/pull/4751 but there are still many pages that are not showing up. See the PR for notes.

orbeckst commented 2 years ago

With #211 , the changes to the config are done through the Crawler web interface. Initial testing indicates that even after switching to v3 (see PR #212) we still have the same issues that are noted here. Furthermore, I see multiple versions of the docs (1.0.1, 1.1.1, 2.0.0, ...) show up. This should really only be stable.