Block or Allow crawlers to access URLs to create sitemaps

jpradocueva commented 1 day ago

@nathan-omaorg (Issue: #407) From the list Resources that crawlers should not be allowed to index the following ones cannot be resolved with robots.txt but need to be addressed by the web server rooter rules:

Also, we should be able to access the content in the folders listed below, but the message I am getting is "site can't be reached." The intention is to create a sitemap for the content to be indexed. Are these folders public? From the list Resources to expose to crawlers so they can be indexed the following required user authorization:

The above ones cannot be resolved using sitemap.xml or robots.txt and should be excluded from this issue.

Evadon-Nathan commented 7 hours ago

Please can you open these as separate issues or the discussion on them will get very confusing.

There are 2 issues here:

implement a way to separate domain dependant robot.txt and sitemap.xml for: https://oma-knowledge-based.openmobilealliance.org/? https://temp.openmobilealliance.org/?
Figure out why these don't work http://www.member.openmobilealliance.org/FTP/Public_documents/? http://www.openmobilealliance.org/WorkProgram https://www.openmobilealliance.org/WorkProgram/?

Thank you.

Evadon-Nathan commented 7 hours ago

I have opened another issue to separate these. We can continue using this issue for robot.txt and sitemap.xml issues

Here's the issue for the bad URLs https://github.com/OpenMobileAlliance/oma-knowledge-base/issues/424

Evadon-Nathan commented 6 hours ago

Do we actually need anything else from this issue? It seems the indexing issue is this one here: https://github.com/OpenMobileAlliance/oma-knowledge-base/issues/407

So I will reply to that issue over there.

Please close this one if it's no longer needed. Thank you

jpradocueva commented 1 hour ago

Problem resolved in #407, #424

OpenMobileAlliance / oma-knowledge-base

Block or Allow crawlers to access URLs to create sitemaps #420