commercialhaskell / rio

A standard library for Haskell
Other
838 stars 54 forks source link

augmentPathMap assumes uppercase PATH on windows #234

Closed hasufell closed 2 years ago

hasufell commented 2 years ago

On windows, the Path variable may not be uppercase PATH, but Path and is frequently returned as such by e.g. getEnvironment and friends. Using it with augmentPathMap may then lead to confusing behavior, where a new key is added and gets ignored.

To make this more robust, on windows, it should:

  1. query the existing map for a case-insensitive PATH key
  2. use that key to insert the new values

If there are multiple PATH keys, we're screwed. There could be a normalizePathKeys :: EnvVars -> EnvVars function that forces it to be upper-case and merges duplicate PATHs? Sth. like:

normalizePathKeys  :: EnvVars -> EnvVars
normalizePathKeys = Map.foldrWithKey (\k v m -> case T.toLower k of
                                            "path" -> Map.insert "PATH" v m
                                            _      -> Map.insert k v m) mempty
snoyberg commented 2 years ago

That's a good catch, and I'm surprised it (apparently) hasn't tripped anyone up yet. Is this a defensive issue, or has this popped up somewhere?

I think some of the assumptions here may be incorrect. For example, env vars on Windows are fully case insensitive. To wit:

Prelude System.Environment> lookupEnv "foo"
Nothing
Prelude System.Environment> setEnv "foo" "bar"
Prelude System.Environment> lookupEnv "foo"
Just "bar"
Prelude System.Environment> setEnv "FOO" "baz"
Prelude System.Environment> lookupEnv "foo"
Just "baz"

Note that the setEnv "FOO" overrides the previous setEnv "foo".

The main issue here seems to be that we're using Map, which is case sensitive. And then, if when reading in the env vars initially, we get path instead of PATH, the logic may fail. I'm not sure that Windows ever uses a casing for PATH besides allcaps. But if, during ingestion of the env vars, we force upper casing of PATH, I think it would address all cases.

hasufell commented 2 years ago

Is this a defensive issue, or has this popped up somewhere?

Only in one of my PRs, trying to expose msys2 paths for calling a process: https://github.com/commercialhaskell/stack/pull/5585/commits/ca7c24993f3bda02323c601f94df2781caf20a33#diff-68447e366b1d0973dd0cffdc20410a4fadc6e575fe8aeb460375d1f2676946fdR663

But if, during ingestion of the env vars, we force upper casing of PATH, I think it would address all cases.

Alright, sounds like you're agreeing to have a normalization function simply fold over the env var map. I can prepare a PR.

snoyberg commented 2 years ago

Overall sounds right thanks!

snoyberg commented 2 years ago

Copying comment from #235:

Now I understand. This isn't the intended workflow when using rio. augmentPathMap is intended to be used with functions like modifyEnvVars. I'd rather go with a doc fix than impact the performance of the library in this way. In your use case, if you want to use getEnvironment directly, you can implement the same fix as within the RIO.Process module itself, by upper casing the keys on Windows.

hasufell commented 2 years ago

Well, if it's just a documentation issue, then I'm confused why the existing stack code is affected by this bug.

See here: https://github.com/commercialhaskell/stack/blob/7564ac94daa439fdbd8aa9ed616542e4e26f8205/src/Stack/Config.hs#L273-L276

extra-path in stack.yaml doesn't propagate because of this if you happen to have a lowercase Path variable in your environment.

snoyberg commented 2 years ago

Because that code should be modified.

hasufell commented 2 years ago

Because that code should be modified.

Well, that's known code... what about proprietary code we cannot fix? It's subtly broken.

snoyberg commented 2 years ago

Please try to be more direct. This feels like word games currently.

You've pointed out that a function in this library is used incorrectly in Stack. It's taken many comments to get to that point. Now the question is whether the function should be modified to be less efficient in all cases, or if the documentation should be updated to indicate correct usage of the function. It's a fairly standard engineering trade-off. Adding in that the situation is "confusing" just encourages time wastage around explanations, which I don't feel like engaging in.

So: should the function be changed or not? I'm ambivalent.

One final comment on this issue/PR situation. I clearly spelled out above what I thought would be an acceptable PR. It involved upper casing the keys. It turns out that (1) that already happens in the library and (2) your PR did nothing close to what I'd mentioned, and instead did the thing you'd already described.

I'm giving one last chance for a productive dialogue here, otherwise I'm moving on.

hasufell commented 2 years ago

Please try to be more direct.

Projects (potentially proprietary ones we cannot fix with patches) may use the function in said "incorrect" way already. I'm pretty sure they don't regularly re-read the RIO documentation. The bug will go unnoticed. Documentation doesn't cause compile errors, doesn't show up in changelogs.

How do we make sure codebases become aware of this? Updating documentation won't be enough.

As such, I think the options are:

  1. take the complexity hit
  2. try to fix this in base
  3. make API breaking changes (such as making EnvVars a newtype)
snoyberg commented 2 years ago

I've given it a bit more thought. I'm on board with your initial recommendation. Would you be able to reopen the PR and include a ChangeLog entry/minor version bump?