Open aplhk opened 3 years ago
Thanks, @aplhk! 🙇🏻 I can reproduce the error you're seeing.
Are you able to share the Google search and/or pages that directed you to that URL? They seem malformed in the first place, so in addition to fixing the behavior, I'd like to fix the URLs at the source if that's something we have control of.
I think this Google dork cover some of the URLs:
site:www.elastic.co/guide inurl:ref
https://www.google.com/search?q=site%3Awww.elastic.co%2Fguide+inurl%3Aref
Thanks again, @aplhk!
@AnneB-SEO Do you know where these URLs might be coming from? I don't think we use the ?ref=
query parameter anywhere within the docs. Are we able to tell Google not to index these sorts of URLs? I can work on the underlying code that's causing the infinite loop.
Do you know where these URLs might be coming from?
I'll need to look into it but upon quick glance it looks like the links could coming form 3rd-party sites, like hackermoon.co and driverlayer.com
I don't think we use the ?ref= query parameter anywhere within the docs.
Likely not
Are we able to tell Google not to index these sorts of URLs?
Yes, but only when we are adding the parameters. If they are coming from a 3rd-party, then we can't instruct Google to ignore them
Let me look into it and also yet loop in @brianjolly for good measure : )
It looks like Google's URL Parameters tool
might be able to help.
https://support.google.com/webmasters/answer/6080548
It says the requirements for using the tool are:
Would you say this issue falls in that category?
Thanks, @brianjolly , that looks promising. I'd want to first confirm that the equivalent pages are getting indexed without the ?ref
parameter, but if so, I think we can tell it to ignore any pages with a ref
query param.
@brianjolly & @gtback - The parameter exclusion only applies to pages we create versus pages created by others. Even so I added the ref
parameter on 9/14
This problem is more extensive and expanding. When this was originally raised there were ~7 URLs from 2 different site (hackermoon.co and driverlayer.com). Today there are over 80 and more than docs are being targeted including Elasticon.
We'll need to file a DMCA takedown notice with Google thru Legal based on:
Thanks for finding and raising @aplhk aplhk. Let's leave this one open until we file. Thanks all!!!
I came across a few links from Google search and found out that precedence of slash (
/
) in the URL query string will lead to malformed / unresponsive document page.Example of malformed page: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html?example.com/a
I believe the root cause is in the TOC fetching script: https://github.com/elastic/docs/blob/5b6ac7928c141d9eebeb13d078501a5e77d64d13/resources/web/docs_js/index.js#L253-L260
In this case
location.href
ishttps://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html?example.com/a
, and after replacing the string it will fetch and appendhttps://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html?example.com/toc.html
which causes infinite loop and unresponsive page.