conda / conda-lock

Lightweight lockfile for conda environments
https://conda.github.io/conda-lock/
Other
467 stars 102 forks source link

Github pip dependencies: InvalidRequirement: Expected end or semicolon #392

Closed mjlbach closed 1 year ago

mjlbach commented 1 year ago

Checklist

What happened?

Since https://github.com/conda/conda-lock/issues/4 is closed (and a feature request), I'm reopening this as a bug report:

  1. Make the following environment.yml
name: test                                                                                                                                                                      
channels:                                                                               
  - conda-forge                                                                                                                                                                 
dependencies:                                                                           
  - python=3.10                             
  - pip:                                                                                
    - xarray                                                                                                                                                                    
    - git+https://github.com/pandas-dev/pandas.git@v1.4.4     
  1. run conda-lock -f environment.yml and see the following error:
    File "/home/michael/.condax/conda-lock/lib/python3.11/site-packages/pkg_resources/__init__.py", line 3211, in parse
    (req,) = parse_requirements(s)
    ^^^^^^
    File "/home/michael/.condax/conda-lock/lib/python3.11/site-packages/pkg_resources/__init__.py", line 3170, in __init__
    super(Requirement, self).__init__(requirement_string)
    File "/home/michael/.condax/conda-lock/lib/python3.11/site-packages/pkg_resources/_vendor/packaging/requirements.py", line 37, in __init__
    raise InvalidRequirement(str(e)) from e
    pkg_resources.extern.packaging.requirements.InvalidRequirement: Expected end or semicolon (after name and no valid version specifier)
    git+https://github.com/pandas-dev/pandas.git@v1.4.4

Seems like the issue is passing

(Pdb++) requirement_specifier
'git+https://github.com/pandas-dev/pandas.git@v1.4.4'

to weparsed_req = Requirement.parse(requirement_specifier)

I'm not 100% sure how to solve this, I think everything should move to using the new from packaging.requirements import Requirement interface as from pkg_resources import Requirement is deprecated, but that doesn't solve the parsing issue (hacked up a WIP PR but it's non functional)

Note adding a name to the requirements solves this particular issue but

name: test                                                                                                                                                                      
channels:                                                                               
  - conda-forge                                                                                                                                                                 
dependencies:                                                                           
  - python=3.10                             
  - pip:                                                                                
    - xarray                                                                                                                                                                    
    - pandas @ git+https://github.com/pandas-dev/pandas.git@v1.4.4     

but causes another one:

  File "/home/michael/.condax/conda-lock/lib/python3.11/site-packages/conda_lock/_vendor/poetry/utils/helpers.py", line 98, in download_file
    with get(url, stream=True) as response:
         ^^^^^^^^^^^^^^^^^^^^^
  File "/home/michael/.condax/conda-lock/lib/python3.11/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michael/.condax/conda-lock/lib/python3.11/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michael/.condax/conda-lock/lib/python3.11/site-packages/requests/sessions.py", line 587, in request
    resp = self.send(prep, **send_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michael/.condax/conda-lock/lib/python3.11/site-packages/requests/sessions.py", line 695, in send
    adapter = self.get_adapter(url=request.url)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/michael/.condax/conda-lock/lib/python3.11/site-packages/requests/sessions.py", line 792, in get_adapter
    raise InvalidSchema(f"No connection adapters were found for {url!r}")
requests.exceptions.InvalidSchema: No connection adapters were found for 'git+https://github.com/pandas-dev/pandas.git#'
maresb commented 1 year ago

Thanks a lot for the examples and the analysis! We should definitely try and support this.

knedlsepp commented 1 year ago

Sounds like this might be fixed by https://github.com/conda/conda-lock/pull/435

knedlsepp commented 1 year ago

Sounds like this might be fixed by #435

I tried it out in the meantime and #435 did resolve the issue with git+https://github.com/pandas-dev/pandas@v1.4.4. However specifying a git revision git+https://github.com/pandas-dev/pandas@ca60aab7340d9989d9428e11a51467658190bb6b still does not work.

Also git+https://github.com/pandas-dev/pandas@v1.4.4 works, but not git+https://github.com/pandas-dev/pandas.git@v1.4.4

maresb commented 1 year ago

Hey @knedlsepp, thanks a lot for testing this!

It seems that in the relevant code we are doing lots of really nasty manipulation of structured data as strings and with the deprecated pkg_resources rather than using established libraries like packaging.

I'm not so sure how difficult this is to fix. I made one very quick attempt in #457, but there are some substantial differences between pkg_resources and packaging. Let's see how it goes with the test suite.

We definitely need a bunch of tests for the various forms of URLs, and also I think this should break if the URL contain basic auth.

knedlsepp commented 1 year ago

@maresb Thanks for such quick actions on that topic! :) Yes, I also noticed in the meantime that there are a couple of things that need improvement. I also noticed that git+https://github.com/python-quantities/python-quantities.git@v0.14.1 doesn't work with the reasoning that "python-quantities" doesn't match the package name "quantities".

We definitely need a bunch of tests for the various forms of URLs

This sounds like a good task for a first time contributor. I'll see what I can come up with. :smiley:

maresb commented 1 year ago

@knedlsepp, exactly! That would be amazing.

maresb commented 1 year ago

Big bonus points if you can find well-established pre-existing or builtin libraries to do all this URL manipulation. Otherwise we're going to miss edge cases and increase our maintenance burden.

mjlbach commented 1 year ago

I'm going to close this issue because my original bug was resolved, but filing a new one to track ongoing issues with the pip + git integration