Pixee-Bot-Python / OLMo

Modeling, training, eval, and inference code for OLMo
https://allenai.org/olmo
Apache License 2.0
0 stars 0 forks source link

Harden `pickle.load()` against deserialization attacks #6

Closed pixeebot[bot] closed 5 months ago

pixeebot[bot] commented 6 months ago

Python's pickle module is notoriouly insecure. While it is very useful for serializing and deserializing Python objects, it is not safe to use pickle to load data from untrusted sources. This is because pickle can execute arbitrary code when loading data. This can be exploited by an attacker to execute arbitrary code on your system. Unlike yaml there is no concept of a "safe" loader in pickle. Therefore, it is recommended to avoid pickle and to use a different serialization format such as json or yaml when working with untrusted data.

However, if you must use pickle to load data from an untrusted source, we recommend using the open-source fickling library. fickling is a drop-in replacement for pickle that validates the data before loading it and checks for the possibility of code execution. This makes it much safer (although still not entirely safe) to use pickle to load data from untrusted sources.

This codemod replaces calls to pickle.load with fickling.load in Python code. It also adds an import statement for fickling if it is not already present.

The changes look like the following:

- import pickle
+ import fickling

- data = pickle.load(file)
+ data = fickling.load(file)

Dependency Updates

This codemod relies on an external dependency. We have automatically added this dependency to your project's pyproject.toml file.

This package provides analysis of pickled data to help identify potential security vulnerabilities.

There are a number of places where Python project dependencies can be expressed, including setup.py, pyproject.toml, setup.cfg, and requirements.txt files. If this change is incorrect, or if you are using another packaging system such as poetry, it may be necessary for you to manually add the dependency to the proper location in your project.

More reading * [https://docs.python.org/3/library/pickle.html](https://docs.python.org/3/library/pickle.html) * [https://owasp.org/www-community/vulnerabilities/Deserialization_of_untrusted_data](https://owasp.org/www-community/vulnerabilities/Deserialization_of_untrusted_data) * [https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html#clear-box-review_1](https://cheatsheetseries.owasp.org/cheatsheets/Deserialization_Cheat_Sheet.html#clear-box-review_1) * [https://github.com/trailofbits/fickling](https://github.com/trailofbits/fickling)

I have additional improvements ready for this repo! If you want to see them, leave the comment:

@pixeebot next

... and I will open a new PR right away!

Powered by: pixeebot (codemod ID: pixee:python/harden-pickle-load)

pixeebot[bot] commented 6 months ago

I'm confident in this change, but I'm not a maintainer of this project. Do you see any reason not to merge it?

If this change was not helpful, or you have suggestions for improvements, please let me know!

pixeebot[bot] commented 6 months ago

Just a friendly ping to remind you about this change. If there are concerns about it, we'd love to hear about them!

pixeeai commented 5 months ago

@pixeebot next

pixeebot[bot] commented 5 months ago

@pixeeai, I opened PR #7, go check it out!