deconst / content-service

An API for storing, indexing and retrieving documentation
MIT License
4 stars 9 forks source link

Staging content showing up in public search results #113

Open meker12 opened 8 years ago

meker12 commented 8 years ago

@smashwilson Content from the rpc-internal build on the roc master branch is showing up in the search index with links to staging URLs (which are all broken). See: https://github.com/rackerlabs/docs-rpc/issues/536#issuecomment-244077310

Question: If this is a bug with no time to fix now, can we add the "no search" configuration to the project, and then re-index the content store to remove the results.

smashwilson commented 8 years ago

D'oh! That's right, we have a robots.txt to prevent indexing from Google for staging content, but not Elasticsearch.

If this is a bug with no time to fix now [..]

I'll have to ask @kenperkins about that one. Basically it'd impact the Carina sprint.

[..] can we add the "no search" configuration to the project, and then re-index the content store to remove the results.

You can indeed.

Triggering a rebuild is a cluster admin thing, though. Let me know (on this issue?) when the unsearchable content is built and I can take care of it.

meker12 commented 8 years ago
To exclude all envelopes within a **content repository** from search indexing, set 
deconst_default_unsearchable to True:

deconst_default_unsearchable = True
Notice that this may still be overridden by individual envelopes with per-page metadata.

Just clarifying -- You set this on each conf.py to exclude content in a project, not in the entire content repository.

smashwilson commented 8 years ago

Correct. I think when I wrote that it was all one-to-one.

+1

meker12 commented 8 years ago

@smashwilson Content has been rebuilt with the "unsearchable" setting. After you re-index, we can close this issue, or possibly change it to:

Update search indexing function to ignore deconst.horse content by default.

That way, people wouldn't have to set "unsearchable" on every project deployed to deconst.horse.

smashwilson commented 8 years ago

@meker12 The re-indexing is complete and I'm no longer seeing staging.horse URLs in the search results for "ceph."

We can keep the issue title as-is; the title's already pretty clear about the problem.

meker12 commented 8 years ago

😎