dask / hdfs3

A wrapper for libhdfs3 to interact with HDFS from Python
http://hdfs3.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
136 stars 40 forks source link

ReDoS Vulnerability in HDFileSystem.glob() #176

Open Alphadelta14 opened 4 years ago

Alphadelta14 commented 4 years ago

Hi, While auditing dependencies I found a particularly nasty ReDoS issue that is fairly simple to implement where client code is concerned.

Versions Affected: hdfs3<=0.3.1

I am publicly disclosing this so that users and package maintainers have their own choice to safeguard themselves, as this repo is not actively developed.

Scenario

Given a properly instantiated client: hdfs = HDFileSystem() Where there exists some file /ababababababababababababababababababababababababababababababababababababababababa (hdfs file name limit is 255) The following expression will cause client code to seemingly hang: hdfs.glob("/*((ab)+)+")

Potential Resolutions

  1. Switch to the native jni client / pyarrow as this repo recommends.
  2. Ensure re.escape() is called during hdfs.glob (Do not allow client code to be compiled into regular expressions)