Closed dbnicholson closed 2 months ago
I went back and looked at #97 and I don't see a real requirement to filter environment variables. There is definitely logic in not leaking potentially sensitive data to other processes, but filtering environment variables is both not sufficient and potentially wrong as seen here.
I also don't totally get why it happens for the committing portion since it uses regular subprocess.check_call
instead of the environment filtering helper. Anyways, I think clear_env
can likely be dropped or only used when dumping environment variables to output.
I threw in a bunch of debug logging and this is quite strange. I'm running the checker with GIT_AUTHOR_NAME
set to some value in the environment. I added a helper that dumps the value out if it's set. First the helper looked like this:
def dump_author():
if 'GIT_AUTHOR_NAME' in os.environ:
log.info(f'GIT_AUTHOR_NAME={os.environ["GIT_AUTHOR_NAME"]}')
else:
log.info('GIT_AUTHOR_NAME not in environment')
Then I added a bunch of calls to the helper. I ran the test tests.test_main.TestEntrypoint
since it runs the whole thing including making a git commit. According to the output, GIT_AUTHOR_NAME
remained set in the environment the whole time even though the git commit output showed it wasn't used!
So, then I changed the helper to this:
def dump_author():
env = subprocess.check_output(['env'], text=True).splitlines()
for line in env:
if line.startswith('GIT_AUTHOR_NAME='):
log.info(line)
return
log.info('GIT_AUTHOR_NAME not in environment')
And now it showed the environment variable not being set right after this. Uh, what? So, somehow the environment used by subprocess
was being corrupted but os.environ
/os.getenv()
were not?
On a whim I decided to make this change since the environment variables weren't actually being altered:
diff --git a/src/lib/utils.py b/src/lib/utils.py
index 92ac90b..2aa3bec 100644
--- a/src/lib/utils.py
+++ b/src/lib/utils.py
@@ -273,6 +273,8 @@ def filter_versions(
def clear_env(environ):
- new_env = copy.deepcopy(environ)
+ new_env = environ.copy()
for varname in new_env.keys():
if any(i in varname.lower() for i in ["pass", "token", "secret", "auth"]):
This caused a crash with RuntimeError: dictionary changed size during iteration
since the loop edits the dictionary in place. That seems obvious in retrospect. Why this succeeds with the dictionary created with copy.deepcopy
, I have no idea. Making the required change:
diff --git a/src/lib/utils.py b/src/lib/utils.py
index 92ac90b..0beecfa 100644
--- a/src/lib/utils.py
+++ b/src/lib/utils.py
@@ -272,7 +272,9 @@ def filter_versions(
def clear_env(environ):
- new_env = copy.deepcopy(environ)
- for varname in new_env.keys():
+ new_env = environ.copy()
+ for varname in list(new_env.keys()):
if any(i in varname.lower() for i in ["pass", "token", "secret", "auth"]):
log.debug("Removing env %s", varname)
And it works! Going back to copy.deepcopy
but keeping the list(new_env.keys())
does not, though. I have no idea why, but presumably the environment filtering is not playing nice with all the async execution.
However, while I think it would be possible to fix this, I think it should just be dropped entirely. This type of blind filtering is bound to break things (as it has here) by assuming that the processes that are being executed don't require those environment variables.
The documented way to run the checker and get the desired git author information is to set the
GIT_AUTHOR_NAME
andGIT_AUTHOR_EMAIL
environment variables. Unfortunately, these are filtered out because they contain the phraseauth
.I think this is also why the documentation says to set
EMAIL
. It shouldn't be needed except thatGIT_AUTHOR_EMAIL
doesn't get through. In endless-key-flatpak, our commits are being authored withroot <os@endlessos.org>
. In other words, we get the email address because we also setEMAIL
in the workflow, but we lose our desiredGIT_AUTHOR_NAME
.Not sure the best way to handle it. The sensitive variable filtering seems useful, and it does make sense that environment variables containing
auth
are likely to be one of those. On the other hand, why is the environment being filtered? Because it's dumped to the output? I would think some environment variable containing credentials might actually be required to run f-e-d-c depending on what the checkers do or what the git push does. @gasinvein?