Open meker12 opened 8 years ago
D'oh! That's right, we have a robots.txt
to prevent indexing from Google for staging content, but not Elasticsearch.
If this is a bug with no time to fix now [..]
I'll have to ask @kenperkins about that one. Basically it'd impact the Carina sprint.
[..] can we add the "no search" configuration to the project, and then re-index the content store to remove the results.
You can indeed.
Triggering a rebuild is a cluster admin thing, though. Let me know (on this issue?) when the unsearchable content is built and I can take care of it.
To exclude all envelopes within a **content repository** from search indexing, set
deconst_default_unsearchable to True:
deconst_default_unsearchable = True
Notice that this may still be overridden by individual envelopes with per-page metadata.
Just clarifying -- You set this on each conf.py to exclude content in a project, not in the entire content repository.
Correct. I think when I wrote that it was all one-to-one.
+1
@smashwilson Content has been rebuilt with the "unsearchable" setting. After you re-index, we can close this issue, or possibly change it to:
Update search indexing function to ignore deconst.horse content by default.
That way, people wouldn't have to set "unsearchable" on every project deployed to deconst.horse
.
@meker12 The re-indexing is complete and I'm no longer seeing staging.horse URLs in the search results for "ceph."
We can keep the issue title as-is; the title's already pretty clear about the problem.
😎
@smashwilson Content from the rpc-internal build on the roc master branch is showing up in the search index with links to staging URLs (which are all broken). See: https://github.com/rackerlabs/docs-rpc/issues/536#issuecomment-244077310
Question: If this is a bug with no time to fix now, can we add the "no search" configuration to the project, and then re-index the content store to remove the results.