ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
408 stars 77 forks source link

hashlib.md5 used for security #383

Open barronh opened 5 months ago

barronh commented 5 months ago

cfgrib's messages.py line 531 fails on newer versions of Python on systems with Federal Information Processing Standards (FIPS) enabled.[2] However, it can be bypassed by using the usedforsecurity=False option for hashlib.md5.

cfgrib is not using the md5 for security, but the usedforsecurity option is only guaranteed available for newer versions of python 3.9+. So, the fix needs to include a failback to the non-keyword approach.

Backward Compatible Fix

Below is a unified patch (diff -up)

--- old/lib/python3.11/site-packages/cfgrib/messages.py        2023-11-29 13:52:50.921602000 -0500
+++ new/lib/python3.11/site-packages/cfgrib/messages.py    2024-06-21 10:31:58.943205473 -0400
@@ -528,7 +528,14 @@ class FileIndex(FieldsetIndex):
         if not indexpath:
             return cls.from_fieldset(filestream, index_keys, computed_keys)

-        hash = hashlib.md5(repr(index_keys).encode("utf-8")).hexdigest()
+        # hash = hashlib.md5(repr(index_keys).encode("utf-8")).hexdigest()
+        keystr = repr(index_keys).encode("utf-8")
+        try:
+            # try python3.9+ keyword, which is also supported on some earlier versions
+            hash = hashlib.md5(keystr, usedforsecurity=False).hexdigest()
+        except TypeError:
+            # unknown keywords trigger TypeError so default back to basic call
+            hash = hashlib.md5(keystr).hexdigest()
         indexpath = indexpath.format(path=filestream.path, hash=hash, short_hash=hash[:5])
         try:
             with compat_create_exclusive(indexpath) as new_index_file:

Instead of try/except, it would be possible to use version specific methods. Fixes to other libraries have gone that direction, but then folks have noted that some older versions do support/require the usedforsecurity (probably depending on the openssl version?). So, I have opted for the try it and see approach.

Reproduce Problem

Environment:

Red Hat Enterprise Linux 8 Python 3.11.5 (main, Sep 22 2023, 15:34:29) [GCC 8.5.0 20210514 (Red Hat 8.5.0-20)] on linux cfgrib: 0.9.10.4 FIPS enabled

Verify that FIPS mode is on

import _hashlib
_hashlib.get_fips_mode()
# 1

Test code and traceback

import cfgrib
import requests
# any grib2 file will work, but here I am using an NWS publicly available file.
r = requests.get('https://tgftp.nws.noaa.gov/SL.us008001/ST.opnl/DF.gr2/DC.ndgd/GT.aq/AR.conus/ds.apm25h24.bin')
with open('test.grib', 'wb') as tmpf:
  tmpf.write(r.content)

f = cfgrib.dataset.open_file('test.grib')

Without the fix, you should get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/test/py311/lib64/python3.11/site-packages/cfgrib/dataset.py", line 782, in open_file
    index = open_fileindex(stream, indexpath, index_keys, filter_by_keys=filter_by_keys)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/test/py311/lib64/python3.11/site-packages/cfgrib/dataset.py", line 761, in open_fileindex
    index = messages.FileIndex.from_indexpath_or_filestream(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/test/py311/lib64/python3.11/site-packages/cfgrib/messages.py", line 531, in from_indexpath_or_filestream
    hash = hashlib.md5(repr(index_keys).encode("utf-8")).hexdigest()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: [digital envelope routines: EVP_DigestInit_ex] disabled for FIPS