aboutcode-org / python-inspector

Inspect Python code and PyPI package manifests. Resolve Python dependencies.
22 stars 19 forks source link

Pip.conf settings are not parsed or respected #180

Open calizarr opened 3 months ago

calizarr commented 3 months ago

An example pip.conf with sensitive variables changed to look like BASH variables:

[global]
extra-index-url = https://aws:${CODEARTIFACT_AUTH_TOKEN}@${DOMAIN}-${ACCOUNT_ID}.d.codeartifact.us-east-1.amazonaws.com/pypi/public-pypi/simple/
index-url = https://aws:${CODEARTIFACT_AUTH_TOKEN}@${DOMAIN}-${ACCOUNT_ID}.d.codeartifact.us-east-1.amazonaws.com/pypi/private-releases/simple/

These are actually set using the pip config set global.extra-index-url so they are presumably following the correct convention.

The file is located at $HOME/.config/pip/pip.conf following the XDG_CONFIG_HOME even when it is located in the legacy location $HOME/.pip/pip.conf it is not respected either.

The pip documentation in general.

EDIT: It also does not respect the relevant environment variables either

pombredanne commented 3 months ago

After a chat with with @calizarr here is the suggestion of a way to go:

I want neither https://github.com/aboutcode-org/python-inspector/pull/170 from @heliocastro that requires installing pip nor https://github.com/aboutcode-org/python-inspector/pull/181 from @calizarr that requires installaing pip and further spawns a subprocess

Instead the approach could be:

heliocastro commented 3 months ago

HI guys

At this draft pr, I started to try to solve this one. Please feel free to go over it and tell me if it is somehow in the right direction.

I migrated initial global settings to a global pydantic_settings object, which allows to expects variables with PYTHON_INSPECTOR prefix. Example: PYTHON_INSPECTOR_INDEX_URL = <something>

I choose PYTHON_INSPECTOR because the clarity.

This move allows to have a .env local file with the variables, have it as an environment variable, and or use the usual command line.

To match the current pip spec, I changed a little the behavior on how index_url was treated. By default, the pip.conf can contains only one single url, and was always overrides by the default one, others are summarily ignored. So now, only the defined index_url is considered, if none is defined, pypi one is the default.

The additional EXTRA_INDEX_URLS and respective command is now the multiple option where any additional indexes could be added.

@pombredanne Little comment, i was able to pass the tests only with regen.

heliocastro commented 3 months ago

Btw, if this approach is ok, I will add the parsing from pip.conf to fit this model as the one of the resources in chain. Would then be something like this order of priority overrides top to bottom:

If none of the options are set, the pypi will always be the main.

calizarr commented 3 months ago

I've made this PR to add environment variables using Click's built-in mechanism for the index-url flag.

https://github.com/aboutcode-org/python-inspector/pull/187

This doesn't solve everything but it's a relatively easy built-in start.

heliocastro commented 3 months ago

That approach works too, but does not tackle all the entries where index is used, and it will cause another extra point to look at in the entire codebase. What i'm tried above, besides be more complex, is simply reduce the variable usage for not only the inspector indec, but for all code, in one single place.

calizarr commented 3 months ago

Oh I understand, however, given that Click is being used for CLI commands, I think it's best to start there and then go from your draft PR.

I still think we need to have an option to use pip.conf and not use the PyPi simple index. My PR is just a start.

I don't see a reason to not take advantage of Click's features.

heliocastro commented 3 months ago

Agreed with that, but again, solving a single point, and again, what i want to avoid is exactly this, we solve one small place, then "later" we will continue. "Later" never comes, someone else will have a similar issue in different place of code, different developer comes and solve in his way, and again, we end up with code that not easy to understand, not consistent.

What i want is try to solve it right, starting to cleanup the code, not just add another plaster over the code that will not be easy to understand for anyone beyond this post.

In other way, your idea of have option to ignore pip.conf is good, and i thinking in doing the same for .python-version files.

But exemplifying on how a global settings object make sense, if we set this disabling options in the settings object, it will be easily accessible through the code, and not need to be passing click contexts or even work, global imported variables always hard to track if something is not working.

calizarr commented 3 months ago

We can certainly attempt to make sure that later is now.

One can still gather all the settings in one place and then pass them along. In this case, Click is still handling flags and env vars from the CLI. However, the settings object can still import those and then use them along the way. They are not mutually exclusive.

This isn't quite another plaster over the code rather than adding one extra feature of Click's built-in functionality. If reading the code for resolve_cli.py then an understanding of Click and/or reading their documentation would be the best way to understand how the CLI flags are working and by extension the envvars.

The ability to ignore configuration files (as it does now) is good and should remain the default as that is what is expected at this point. However, a flag/envvar to enable configuration file parsing in addition to what is in your PR makes total sense to me.

I attempted to keep my PR as simple as possible without rewriting as much code. In the case of the resolve_dependencies function, Click parses either the CLI flags (-/--) or envvars (PYINSP_INDEX_URL) and then passes them as arguments to that function, so there isn't a global variable hanging around. With a global settings object, instead of directly going to resolving dependencies, the function can then receive the arguments and place them in the global settings object and then go from there.