Long-term plan for `parse_requirements` API

di commented 2 years ago

The parse_requirements API is currently a bit of a wart on this project as it doesn't adhere to the general goal of using the underlying pip CLI as much as possible, and instead is mostly a reimplementation of requirements parsing in pip.

As I see it we have three options:

we could continue to hack on it to support new features like #120 and continue to further diverge from pip;
we could use some other requirements parser that includes features like #120
we could push for a CLI in pip that exposes this functionality
we could move this into pypa/packaging as a utility library that could be reused by pip.

Short term, I think we should generally do 1 to unblock downstream work, with the understanding that this isn't the long-term goal.

For 2, @pombredanne mentioned working on a standalone requirements parser, but I'm not sure what the status of that is, perhaps we could get a summary here.

I think 3 would be interesting but it's longer-term work, and I'm not sure if the pip team would be receptive to it due to what I imagine is a limited use case.

Ultimately I think 4 is the right answer here. This could involve upstreaming either the parsing logic here, in pip, in some other requirements parser, or a combination of all of them.

pombredanne commented 2 years ago

For 2, @pombredanne mentioned working on a standalone requirements parser, but I'm not sure what the status of that is, perhaps we could get a summary here.

@di fyi this is in https://github.com/nexB/pip-requirements

I did start from the pip code and toned it down keeping the filtered line/blame-level history. I removed all the parts that are not about parsing and all network-related code, plus anything that would require accessing the local file system except for an optional nested files -r/-c loading and parsing. I found it a bit easier than starting from your fork (even though this may be very similar initial pip code).

I kept and adapted all pip tests plus added tests from many requirements parsing tools (including yours ;) ) and eventually improved things such that:

comments are preserved,
all original lines are tracked and preserved during parsing,
errors are captured and not silenced but tracked as invalid lines (which pip does sometimes),
I added back parsing for old legacy pip options (to better report errors)
finally I added a dumps() function to dump back the parsed results to a requirements file.

The code is a bit quirky because it is essentially pip's own code adapted and mildly improved.

we could move this into pypa/packaging as a utility library that could be reused by pip.

Anything like this would be great (any code that could parse pip requirements correctly, I am not talking about my experiment in particular). But there are really a set of pip-only things that IMHO do not fit in packaging

we could push for a CLI in pip that exposes this functionality

That would work, but the original parsing code is seriously tangled with mixed up CLI options handling, (random) file systems access and network calls. Having done the work above, I'd say that some parser could be surgically implanted back but I have a hard time to figure if and how pip's own parser could be seriously operated and exposed without breaking other pip features.

FWIW, my use case is integration in ScanCode toollkit to detect, parse and collect any requirement files (it does also installed wheels, setup.py and other manifests) and replace dparse in https://github.com/nexB/scancode-toolkit/blob/0b578694740223524a489ce2fd46eb787b7e77db/src/packagedcode/pypi.py#L49

Beyond this, we are implementing a generic version range (which will feed a basic dependency resolver) with https://github.com/nexB/univers to resolve dependent ranges in deps and in vulnerable version ranges.

The whole goal of this is to support vulnerabilities data aggregation and vulnerability queries in https://github.com/nexb/vulnerablecode using the Package URLs found by scancode and possible resolved versions.

pombredanne commented 2 years ago

Just to clarify that in my use cases the place where I parse requirements:

may not have other files side-by-side and should not expect these to be there,
should not require network access (potentially air-gapped envt.),
is not running the OS/arch/Python version that would be where these would be typically installed

di / pip-api

Long-term plan for `parse_requirements` API #121