chrismattmann / tika-python

Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Apache License 2.0
1.49k stars 234 forks source link

Use of hashlib.MD5 on FIPS configured installations #348

Closed scarton closed 1 year ago

scarton commented 3 years ago

Specifically, every use of hashlib.md5() is an issue for FIPS kernels which lack openssl support for md5. Can hashlib.md5() be cahnged to use heshlib.new with the usedforsecurity set to False?

scarton commented 3 years ago

Specifically, around line 614 in tika.py: m = hashlib.md5() Replace with m = hashlib.new('MD5', userforsecurity=False)

chrismattmann commented 1 year ago

would be definitely open to this, @scarton that said, would need a PR and a test to expose it. Would also make sense to update travis.yml to specifically test for this. When you have a PR ready please open up a new PR and I'll review.

griffin-rickle commented 1 year ago

Hi, I am interested in picking up the work for this. However I'd like to simply update the md5 check to a sha1 check. If there was another checksum provided by the tika maven repository I'd use that, but sha1 is the best we've got at the moment, assuming there's not some FIPS-compliant manner of verifying the .asc file I see in the repo (i.e. https://repo1.maven.org/maven2/org/apache/tika/tika-server-standard/2.6.0/tika-server-standard-2.6.0-bin.zip.asc). Does that sound reasonable?

chrismattmann commented 1 year ago

Hi @griffin-rickle yes sounds reasonable, but could you also make it back compat by providing an env var (maybe TIKA_JAR_HASH or something) that identifies the name of the HASH file type and by default sets it to md5 but allows changing to asc?

griffin-rickle commented 1 year ago

Sure, I can do that! Just for awareness, the TIKA_JAR_HASH will default to md5 but sha1 will be an allowed value (not asc, since the asc provides a mechanism to verify the signature, not a checksum).