Open MaxwellDPS opened 1 month ago
Need to assess whether it is necessary to write the file to disk. cc @richard-julien
Same issue on threat fox, it isnt even using a directory and is writing to yield them. This would be easy to swap out for StringIO
This is a soild example of why this is bad, assume a k8s pod with a readonly FS, an emptyDir is a perfect use case to cache a csv. But since CSV_PATH is really just f"{BASE_PATH}/data.csv"
this is a security issue as I cant lock down the file system. This gives me 2 options:
One of 2 solutions is needed:
This makes no sense
try:
zipped_file = io.BytesIO(data)
with zipfile.ZipFile(zipped_file, "r") as zip_ref:
with zip_ref.open("full.csv") as full_file:
csv_data = full_file.read()
except zipfile.BadZipFile:
# Treat as an unzipped CSV from /recent/
csv_data = data
with open(CSV_PATH, "wb") as fd:
fd.write(csv_data)
with open(CSV_PATH, "r", encoding="utf-8") as fd:
yield from (line for line in fd if not line.startswith("#"))
Recommending this is changed to not use a generator as no memory saving is happening anyhow by using csv_data = full_file.read()
try:
zipped_file = io.BytesIO(data)
with zipfile.ZipFile(zipped_file, "r") as zip_ref:
with zip_ref.open("full.csv") as full_file:
csv_data = full_file.read()
except zipfile.BadZipFile:
# Treat as an unzipped CSV from /recent/
csv_data = data
return [
line
for line in csv_data.split()
if not line.startswith("#")
]
Yes @romain-filigran, i think @MaxwellDPS is right, there is no need to write the file in some connectors. That could be improved
Description
In the import document connector the location used by
_download_import_file()
needs to be defined as a volume.This poses a security issue if users decide to just not run with the root filesystem of a container due to having no context of this and turning it off.
More broadly k8s covers security contexts on runtime really well, I would recommend all containers be able to run non-root with a read only filesystem
Per Docker best practice is to make anyplace files are created at run time a volume.
Environment
Reproducible Steps
Steps to create the smallest reproducible scenario:
Expected Output
Proper volume definitions
Actual Output
Runtime errors if no volume is mounted
Additional information
https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-the-security-context-for-a-pod https://docs.docker.com/build/building/best-practices/#volume