A standard or better way to populate local environments with azd env variables

pamelafox commented 1 month ago

We currently have many templates that need access to azd environment variables to be able to run either hooks, scripts, or local dev server.

There are two ways that templates often do that:

Write the full azd env into a .env file, and then load it with a language package like python-dotenv:

azd env get-values > .env

Use shell commands to write the env variables into the environment, and call programs from the shell script:

Write-Host "Loading azd .env file from current environment"
foreach ($line in (& azd env get-values)) {
    if ($line -match "([^=]+)=(.*)") {
        $key = $matches[1]
        $value = $matches[2] -replace '^"|"$'
        [Environment]::SetEnvironmentVariable($key, $value)
    }
}

Why those are bad

Both of these approaches are problematic as they can leak the azd environment variables into the global environment.

For example, the " > .env" approach leaks into the global environment when you're using the Python extension, as that extension (as a default behavior) automatically copies .env variables into the global environment. It took me months to figure out why my global env was getting tainted constantly.

The Powershell code above can also leak into the Windows shell, depending on how the rest of the script issues commands.

It is very bad when the full azd env variables leak into a global environment, since they include AZURE_ENV_NAME, AZURE_LOCATION, AZURE_SUBSCRIPTION_ID. If you then try to switch environments, you will find azd constantly trying to deploy with the values of the old environment. It's very confusing and caused me days of work over the last year trying to figure out what was happening.

Better approaches

I am now taking one of two approaches:

1) Using a script to auto-write only the necessary variables, and making sure those variables aren't also inputs in main.parameters.json: https://github.com/Azure-Samples/azure-openai-keyless-python/pull/7/files#diff-129e0db6b0e28f105813de4b3029d708f8012191253104aaadc5086e69a51aa3

That's not super robust, since it has the constraint that you can't also have those variables as inputs, but it can work for some simple samples.

2) Using a Python script to dynamically load in the current azd environment, using python-dotenv, so that it only ever is used inside that Python program..

https://github.com/Azure-Samples/azure-search-openai-demo/pull/1986/files#diff-6099ee740b8b4a7f97ac1e1dfff11776df721ed84c635e510c9aba8f922ca612

That is my current preferred approach, though it has the drawback of feeling a little overly complex for samples that are designed as teaching samples.

3) We provide vscode tasks that use the azd-provided dotenv as well, but that only works if you're running from VS Code, and we need to provide non-VS Code scripts as well.

EVEN better approaches??

These are related issues and PRs around this issue:

https://github.com/Azure/azure-dev/pull/4078

https://github.com/Azure/azure-dev/pull/4131

https://github.com/Azure/azure-dev/issues/1163

https://github.com/Azure/azure-dev/issues/4067

pamelafox commented 1 month ago

^^ Added related issues/PRs to the description.

rajeshkamal5050 commented 1 month ago

@vhvb1989 might be a good use-case for Named/Specialized hooks feature?

vhvb1989 commented 1 month ago

IIRC, at the top of the wish-list is to use the python app as the starting point.

For example, we currently support running hooks where azd automatically set all the .env values as env vars for the script defined in the hook, but this requires folks to use azd as the starting point. Either by running the azd command that triggers the hook, or by running azd hooks run.

We also considered having something like azd lauch python-app.py , but again, this means folks depending on azd to run their apps/scripts.

In a world where we want people to just run python foo.py and have it easy to pick azd environment values, I would probably aim for a python lib (or sdk) for azd. Folks would just add that lib to the list of requirements and adding it to the top of their app would handle pulling the azd env's values. Sounds like a good open source project to me :) LMK what you think @pamelafox

pamelafox commented 1 month ago

Yeah I tend to agree, for Python we would like to be able to say "python bla.py" or "py bla.py" or whatever works on an OS. Or even use other Python-specific runners like the new uv packager, which would mean "uv run bla.py".

I feel a little silly making a Pypi package out of my 10 line script, but I could do it! Or are you saying you'd do it? I wouldn't have it auto import, as we sometimes only want to pull them in when we're running locally. So for azure-search-openai-demo, I first check an env var like "RUNNING_IN_PRODUCTION" and if not, load from azd.

I'm curious what Yohan would want for JS environments though, I've asked him to comment.

richardpark-msft commented 1 month ago

In a world where we want people to just run python foo.py and have it easy to pick azd environment values, I would probably aim for a python lib (or sdk) for azd. Folks would just add that lib to the list of requirements and adding it to the top of their app would handle pulling the azd env's values.

I think this is problematic because now we're saying that 'azd' abstractions are leaking into the app, and we have an extra package that we have to maintain for every language. With .env files the benefit is that you don't know where they came from, they just exist.

vhvb1989 commented 1 month ago

Creating an sdk lib for azd would bring more things. We would probably start with getting the .env values, but I think on more scenarios like

List environments, find azd projects, even calling commands from your App.

I can definitely help/collaborate to a project like this, but it would require more than one happy developer XD.

I think an sdk lib could be added under the Azurw SDK umbrella. Instead of an azure service, the target would be a local azd service or just the cli

pamelafox commented 1 month ago

@richardpark-msft I like .env files, but then we have the issues I discussed above, where the variables leak into the global environment and get picked up by azd on the next run. Please, when you switch environments, we need to remember to update the .env. So we'd need to address those issues if we wanted to keep using .env files as our standard approach.

richardpark-msft commented 1 month ago

In the .env libraries I've used they usually have an option of specifying the actual .env file you load (with the default being .env!). Would this be solveable if we could just have built-in filtering to the env get-values command?

Looking at it now, it has --environment string. Allowing me to specify a list of variables, or even wildcard/regex would be enough for me to easily compose an .env file without too much trouble.

sinedied commented 1 month ago

In almost all contexts I've been working with (and not only JS/Node.js contexts), .env files are standard to set up local dev environments, and developers are used to work with them.

The fact that .env files leak into the global env is really not great and would probably need a separate issue sent to the Python extension, but the fact that azd env get-values produces unwanted extra env vars is not good either:

It makes the .env more complicated than needed (making it harder to understand how a project/sample is set up)
If for whatever reason the values are leaked into the current env as @pamelafox mentioned, it can mess up your AZD deployment. This caught me a few times, making me deploy to a different subscription where I did not want to depoy to!
Sometimes you want to start by manually filling in your .env file because you want to test the app before deploying or doing anything with AZD (like connecting to local DB in containers, using local OpenAI proxy...). After AZD deployment, having many more variables than what was needed to fill manually can be very confusing for folks (had this feedback regularly)

@richardpark-msft I'm not sure what you means by having built-in filtering to env get-values, but having something like this would definitely help the issue:

azd env get-values > .env would ONLY output .env file set as output in the infra, or set manually with azd env set command
azd env get-values --all > .env would output everything (if needed), as the current behavior. Introducing the extra flag would sure be a breaking change, but makes it less error-prone to unwanted scenarios.

pamelafox commented 1 month ago

The filter could help, as AZURE_ENV_NAME is definitely the most problem-causing of the env variables. However, many people do currently have a flow where an outputted env variable is also an input env variable for main.parameters.json, so we would need to discourage that flow. Otherwise you'll have weird things leaking, like some customization for one environment leaking into an environment where you don't want it.

I'll file an issue with Python extension about the global leaking. I don't know if other language extensions also do that.

weikanglim commented 1 month ago

For the purpose of this specific issue, I wonder if, along the lines of what @richardpark-msft is proposing, azd could provide a filter mechanism with get-values. Either:

azd env get-values --filter APP_* - Glob expression filter
azd env get-values --filter APP_ - Regular expression filter

I'd lean towards regex here -- most times ^APP_ works just as fine as APP_, and regex would be most flexible.

I also wonder if, we may want to think about:

azd env get-values --filter APP_ --set-or-append .env - where --set-or-append would only append or set keys that are present in azd env.

In general, the get-values gesture needs to cater towards "easy app settings referencing needs". In the future, this should expand to more output formats, like appsettings.json for .NET developers.

What isn't covered here by this simplistic proposal, is that there are cases where users want environment values to "flow" into their client-side builds without the necessary exporting of .env -- this was summarized by @sinedied previously in #3456.

people do currently have a flow where an outputted env variable is also an input env variable for main.parameters.json, so we would need to discourage that flow.

I created #4387 since this is also a topic of interest that I think about a lot. Would love to hear your feedback here.

pamelafox commented 1 month ago

Should we be prefixing certain env variables with APP_ then? We don't have a convention currently, so I wouldnt have a regex that would work with my current templates, but I could move towards a convention.

sinedied commented 1 month ago

@weikanglim Adding a prefix to get the filtering we want would not be working here: many frameworks or SDK needs specific env vars names, and most Azure tools use AZURE_* for env vars, which is also used by AZD.

weikanglim commented 1 month ago

@sinedied Happy to learn more from your example.

In my mind, with the simplistic model, you could simply rerun azd env get-values targeting all the variables you care about, for example:

azd env get-values --filter AZURE_CLIENT_ID --set-or-append web/.env
azd env get-values --filter ^VITE_ --set-or-append web/.env

We can also build towards something where "app referencing" is more of a first-party concept that can be expressed via some azure.yaml configuration -- happy to learn from any observations you have.

richardpark-msft commented 1 month ago

Should we be prefixing certain env variables with APP_ then? We don't have a convention currently, so I wouldnt have a regex that would work with my current templates, but I could move towards a convention.

If we're talking full regexes we can also do something like this:

(var1|var2|var3)

Using alternation, and that would also be valid. So just having that support, on it's own, would be enough to specify all the variables we want to grab.

pamelafox commented 1 month ago

I likely would not use a complex regex and just build the file using individual azd env get-value calls as I'm doing now, but there's still the issue that azd environment variables can be tainted by the global env variables.

Stepping back a bit, what if we could be more specific about the env variable references in main.parameters.json? Right now, something like $AZURE_OPENAI_KEY can come from both .azure/CURRENT-ENV/.env or come from the global environment. That caused so many issues for developers with one of my templates, because I had mistakenly named my azd variable the same as a commonly set environment variable, without realizing it, but the value didnt have the same meaning.

What if we were instead explicit in main.parameters.json, like:

$azdenv:AZURE_OPENAI_KEY

And that value could only come from the current azd environment?

Then we'd have less accidental variable name collisions, less effects from global env variable tainting, etc.

And we could still have ones that are allowed to come from a global env, like $GITHUB_ACTIONS

weikanglim commented 1 month ago

I likely would not use a complex regex and just build the file using individual azd env get-value calls as I'm doing now

Just wondering, how are you currently supporting users that already have an existing .env file?

For example, if I have RUNNING_IN_PRODUCTION already stored in .env, and the scenario is that I want to run azd provision, and expect after provisioning, azd would update the AZURE_KEYVAULT_ENDPOINT variables but keep RUNNING_IN_PRDOUCTION, and other variables intact.

Then we'd have less accidental variable name collisions

One thing that could help here: if azd encourages/supports a mapping of AZURE_VAR_xxx instead of AZURE_xxx. I think this makes it very intentional on the environment variable being present. Happy to discuss this further on #4404.

pamelafox commented 1 month ago

For situations where users start of with a .env file, then I usually just tell them how to update that file after running up, so I say "Copy the value from azd env get-value AZURE_KEYVAULT_ENDPOINT into the .env file". That's slightly error-prone if they paste the wrong value, but it does mean I can support the "local development first" scenario. Another option would be to write a shell script that auto-updated a .env according to azd env get-value.

richardpark-msft commented 1 month ago

What if we were instead explicit in main.parameters.json, like:

$azdenv:AZURE_OPENAI_KEY

And that value could only come from the current azd environment?

Would the output variables could also be given a standard prefix? If the variable's also get formatted with a specific name then it could easily be regexed against in other spots as well.

sinedied commented 1 month ago

@sinedied Happy to learn more from your example.

In my mind, with the simplistic model, you could simply rerun azd env get-values targeting all the variables you care about, for example:

azd env get-values --filter AZURE_CLIENTID --set-or-append web/.env azd env get-values --filter ^VITE --set-or-append web/.env We can also build towards something where "app referencing" is more of a first-party concept that can be expressed via some azure.yaml configuration -- happy to learn from any observations you have.

I really don't think using using regex or filters this way when you need to extract multiple values is user-friendly. It might work for our samples which have a small amount of env vars, but in real world scenario you have dozens of vars, and you don't always control their naming as it comes from frameworks and libs requirements.

What I hear as feedbacks from customers is that they're looking for more control over what the tooling does (automatically), not more complexity and I think this falls in this use case.

What @pamelafox is proposing for avoiding conflict seems more in the right direction, and I would even go further to explicitly "namespace" all input vars, similar to how you do it in GitHub Actions for example:

"value": "$azd.AZURE_LOCATION" for values generated/coming from AZD
"value": "$env.OPENAI_API_KEY" for env values

cedricvidal commented 1 month ago

@pamelafox drew my attention to this thread after I mentioned to her I created a small Python azd env loading library, while it doesn't cover the full scope of this discussion, it addresses some of the use cases for Python applications so I figured it might be useful to share here: https://pypi.org/project/dotenv-azd/

I personally use that lib to switch environments using azd env select and just run my Python scripts knowing they're azd aware and will just pick up whatever env vars are in the currently selected environment.

cedricvidal commented 1 month ago

IIRC, at the top of the wish-list is to use the python app as the starting point.

For example, we currently support running hooks where azd automatically set all the .env values as env vars for the script defined in the hook, but this requires folks to use azd as the starting point. Either by running the azd command that triggers the hook, or by running azd hooks run.

We also considered having something like azd lauch python-app.py , but again, this means folks depending on azd to run their apps/scripts.

In a world where we want people to just run python foo.py and have it easy to pick azd environment values, I would probably aim for a python lib (or sdk) for azd. Folks would just add that lib to the list of requirements and adding it to the top of their app would handle pulling the azd env's values. Sounds like a good open source project to me :) LMK what you think @pamelafox

I like the azd launch <command> idea, it covers a lot of use cases and allows to keep the code azd agnostic. I don't believe it makes the people depend on azd. They can use whatever env loading technic they want but this gives them the option of an easy path.

May I suggest run instead of launch? and maybe put it under azd env run for consistency with other env related commands.

Azure / azure-dev