backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 40 forks source link

Remove rules for non-existent directories and files in robots.txt file #4683

Open alanmels opened 4 years ago

alanmels commented 4 years ago

Description of the bug

I believe such entries in robots.txt file as:

Disallow: /profiles/
Disallow: /web.config

are Drupal-7 remnants (https://git.drupalcode.org/project/drupal/-/tree/7.73/profiles, https://git.drupalcode.org/project/drupal/-/raw/7.73/web.config) as they do not exist in Backdrop:

drwxr-xr-x 15 docker dialout   480 Oct  4 21:29 core
drwxr-xr-x  4 docker dialout   128 Oct  4 21:29 files
-rw-rw-rw-  1 docker dialout   578 Oct  4 21:29 index.php
drwxr-xr-x  3 docker dialout    96 Oct  4 21:29 layouts
-rw-rw-rw-  1 docker dialout 18092 Oct  4 21:29 LICENSE.txt
drwxr-xr-x  3 docker dialout    96 Oct  4 21:29 modules
-rw-rw-rw-  1 docker dialout  4063 Oct  4 21:29 README.md
-rw-rw-rw-  1 docker dialout  1216 Oct  4 21:29 robots.txt
-rw-rw-rw-  1 docker dialout 18494 Oct  4 21:29 settings.php
drwxr-xr-x  4 docker dialout   128 Oct  4 21:29 sites
drwxr-xr-x  3 docker dialout    96 Oct  4 21:29 themes

At the same time, unlike respective robots.txt file for Drupal-7 has disallow directives for:

# Directories
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /profiles/
Disallow: /scripts/
Disallow: /themes/

whereas Backdrop's robots.txt file has only:

# Directories
Disallow: /core/
Disallow: /profiles/

leaving several directories in the root directory unprotected.

Expected behavior

Doesn't hurt leaving those two entries in robots.txt file, however I believe eventually Backdrop's code-base should be cleaned up of Drupal-7 traces, which won't be used at all. Also I believe the robots.txt file needs to dissalow crawling such Backdrop specific directories as files, layouts, modules, sites, themes.

alanmels commented 4 years ago

Since Yahoo! is already a history, I've replaced it with Bing in my PR. Also didn't have time to investigate why Drupal didn't have entry for the /sites/ directory and followed the suit, but introduced new disallow rules for /layouts/, /modules/ and /themes/.

alanmels commented 4 years ago

I was not also sure if the /files/ directory in Backdrop's root should be included in the disallowed section.

indigoxela commented 4 years ago

@alanmels Many thanks for providing a PR.

I do have concerns, though:

Disallow: /themes/

This means that, for instance, the logo provided by a (custom) theme is also forbidden. I'm pretty sure this is not as intended.

The same concern would apply to the files directory. Why prevent indexing of all images, pdf, ...?

ghost commented 4 years ago

/profiles/ still exists and is used in Backdrop if you create it:

image

alanmels commented 4 years ago

Guys, thanks for the comments!

I do have concerns, though:

Disallow: /themes/

This means that, for instance, the logo provided by a (custom) theme is also forbidden. I'm pretty sure this is not as intended.

The same concern would apply to the files directory. Why prevent indexing of all images, pdf, ...?

@indigoxela, I am not sure why Drupal's robots.txt listed /themes/ among disallowed directories. Probably the rationale was that the /themes/ directory contains more or less static files and not the content. And probably the /files/ directory was not included, because of the same reason: it does contain dynamically changing content that can be crawled.

I'd like to hear more opinions on that and if the consensus will be to follow Drupal's suit, then to proceed to removing the /files/ directory from PR, leaving the /themes/ intact in disallow rules. What do you, guys, say?

/profiles/ still exists and is used in Backdrop if you create it:

image

@BWPanda, thanks for pointing this out. I'm ready to change the PR after hearing more opinions on why Drupal's robots.txt file has included the directory to disallow rule, while adding only files of certain extensions within the directory to allow rule:

Allow: /profiles/*.css$
Allow: /profiles/*.css?
Allow: /profiles/*.js$
Allow: /profiles/*.js?
Allow: /profiles/*.gif
Allow: /profiles/*.jpg
Allow: /profiles/*.jpeg
Allow: /profiles/*.png

# Directories
Disallow: /includes/
Disallow: /misc/
Disallow: /modules/
Disallow: /profiles/
Disallow: /scripts/
Disallow: /themes/

Should we do the same?

ghost commented 4 years ago

In comparing D7 & Backdrop, remember that Drupal's core files are in the root directory, while Backdrops are in the /coredirectory.

So, for example, if Drupal is excluding /modules/ but not modules/ (i.e. only module directories in the root directory), then they're excluding core modules but not contributed/custom ones.

klonos commented 4 years ago

I just realized that, unless I'm missing something, our change records do not mention that (as in Drupal 8) all core files and folders have been moved under the /core directory, and that the top-level /modules /themes and /layouts folders are to be used to hold custom and contrib projects instead of core.