backdrop / backdrop-issues

Issue tracker for Backdrop core.
144 stars 39 forks source link

[SR][D8] Views: Replace md5 calls with sha256 #5785

Open klonos opened 1 year ago

klonos commented 1 year ago

This is the respective issue as https://www.drupal.org/project/views/issues/1884828, which will be a backport of what has already been done in the core version of Views in Drupal.

Per Drupal secure coding standards at http://drupal.org/node/845876

md5 and sha1 should not be used any place in Drupal core since 7.0, but are re-introduced in the Views module for 8.x: ...

The linked article states:

For Drupal 7 and later core and contributed modules, the md5() and sha1() hash functions should never be used in any code, since they are considered obsolete and potentially insecure for some applications. This is a settled policy for Drupal core. For a normal hash function use sha-256 by calling hash('sha256', $data).

Even if the use of such functions are not for security purposes, any use of them at all can cause third party security audits of the codebase to raise flags. This can be a problem if, for example, Government entities require such audits - which would then require additional documentation to verify that they are indeed, not a security issue.

The article further links to https://www.drupal.org/project/drupal/issues/723802 which states:

FIPS 180-3 and FIPS 198-1 require transition to sha-2 family functions for use in US government applications in 2010. Let's update Drupal 7 to use these more secure algorithms so that Drupal is not excluded from being used for federal sites.

klonos commented 1 year ago

PR that updates any file that belongs to the Views module: https://github.com/backdrop/backdrop/pull/4208

Questions:

  1. should we be changing md5() to hash('sha256', ...) everywhere in core? ...and if so, should we do it here, in a single PR, or separate issue about that? 🤔
  2. If md5() is not to be used, and sha256 preferred in its place, should we be including this in our linting (see #3213, #5296 etc.)?
  3. Shouldn't we include https://www.drupal.org/node/845876 (whatever applies to us) in https://docs.backdropcms.org/documentation/writing-secure-code
avpaderno commented 1 year ago
  1. I would use different issues, since the change should be considered case by case. For example, if the hash is stored in a database table, the database schema could need to be updated; if the stored hash is used as "identifier" for data, the code still needs to use the old hash function to load data previously saved.
  2. Since both md5() and hash('sha256') return a cryptographic hash, misuses of those functions should be easily catch from reviewers. When I see md5() used in code my thoughts are It's broken. and Is it really necessary? (I personally find harder to catch misuses of other functions.) I would add an automatic check for other cases, not to avoid md5() is used.
  3. What reported in the Use of hash functions and Drupal 7 sections could be helpful. The documentation on drupal.org was created on 2010, though. I am not sure that nowadays is still necessary to explain that MD5 is broken and it should not be used when a cryptographic hash is necessary. Still, on documentation about writing secure code, I would expect a section about cryptography (which could also explain that a database should store password hashes, not an encrypted password).
avpaderno commented 1 year ago

I will make an example to explain my second point in the previous comment, using view code.

// Allow hook_views_pre_view() to set the dom_id, then ensure it is set.
$this->dom_id = !empty($this->dom_id) ? $this->dom_id : md5($this->name . REQUEST_TIME . rand());

When I see that code, I think:

That code seems to just create an ID that is possibly unique. There isn't other code that tries to get the same ID starting from the same $this->name; if that were the case, rand() wouldn't be used. I would get a unique ID using with uniqid('', true) or combining the output of uniqid('', true) with the output of other functions.

avpaderno commented 1 year ago

In views, md5() is never used to get a cryptographic hash; it's rather used to get a 32-byte value that doesn't contain spaces, parentheses, and other characters from a string. In a case, it's used to obtain a possibly unique value, for which the alternative is to use uniqid('', TRUE); in the other cases, backdrop_base64_encode() could instead be used.

In the Drupal issue, merlinofcaos says:

An update should also clear caches and other things where we're using these MD5s for later lookups to avoid confusion.

Also changing those will unexpectedly change people's block deltas. While it seems highly unlikely that anyone is using a silly md5 for their CSS on their view blocks...and yet, I've seen people do weirder things. So we could potentially hurt those sites.

The PR replaces md5() with hash(), but it doesn't remove those MD5 hashes stored in the database (cache or other tables). I looked at the code, but I wasn't able to understand if those values are effectively stored in the database.

avpaderno commented 1 year ago

If an hash function is used because it always produces an output of the same length, independently from the input length, I would use:

If there isn't a non cryptographic hash that could be used, I would use a cryptographic hash.

HAVAL-160,4 and HAVAL-160,5 are cryptographic hashes that are unbroken and produce an output of 40 byte. They aren't vulnerable to length extension attach, contrary to SHA256. (That's what Hash function security summary reports.)
If we just need to avoid MD5 and SHA1, HAVAL-128,4 and HAVAL-128,5 are an alternative. They produce an output of 32 bytes, like MD5.

avpaderno commented 1 year ago

As for non-cryptographic hashes, PHP 5.6 supports fnv1a64, which outputs 64 bits. For example, changing the following line would be simple.

$this->dom_id = !empty($this->dom_id) ? $this->dom_id : md5($this->name . REQUEST_TIME . rand());

The new code would be the following.

$this->dom_id = !empty($this->dom_id) ? $this->dom_id : hash('fnv1a64', $this->name) . hash('fnv1a64', uniqid('', TRUE));

The double call to hash() is only necessary to get an hash that is long like the hash returned from md5(). Otherwise, the code could be simply be the following.

$this->dom_id = !empty($this->dom_id) ? $this->dom_id : hash('fnv1a64', uniqid($this->name, TRUE));
klonos commented 1 year ago

@kiamlaluno I would like to thank you for taking the time to provide feedback and elaborate. I wanted to acknowledge your responses here since it's been some time sine you've posted them, but I don't have any specific thoughts at the moment. I will need to rethink and carefully examine your points before I have anything useful to add.

In the meantime, feedback from others is welcomed as well.

avpaderno commented 1 year ago

I will add what I found out after further research.

To generate a 128-bit hash, it is better to use SHA3-256 and truncate the hash to 128 bits.

$hash = hash('sha3-256', $input);
$truncated_hash = substr($hash, 0, 16);

The truncated hash is still collision-resistant, contrary to non cryptographic hashes (although the probability of collisions with FNV1a128 is low).

That code requires PHP 7, though, ashash() does not support SHA3 in PHP 5.6.