aswinnnn / pyscan

python dependency vulnerability scanner, written in Rust.
MIT License
184 stars 6 forks source link

prompt/default to when dependency conflict occurs #3

Closed anotherbridge closed 1 year ago

anotherbridge commented 1 year ago

Describe the bug Although #1 is now fixed, I still encountered a similar issue that I didn't recognize before (since I didn't test this case). When a minimum version is specified in the requirements.txt, then the tool will take the version that is specified instead of checking which version is actually used.

To Reproduce Steps to reproduce the behavior:

  1. Install the following requirements file into a separate environment:
    pandas>=1.5.0
    requests>=2.30.0
    beautifulsoup4>=4.12.2
    lxml>=4.9.2
  2. Run pip freeze to investigate the installed versions:
    pandas==2.0.1
    requests==2.31.0
    beautifulsoup4==4.12.2
    lxml==4.9.2
  3. Run pyscan and investigate the output:
    pyscan v0.1.3 | by Aswin (github.com/aswinnnn)
    Found 4 dependencies...
    |-| pandas [1.5.0] -> No vulnerabilities found.
    |-| requests [2.30.0] -> Found vulnerabilities!
    |-| beautifulsoup4 [4.12.2] -> No vulnerabilities found.
    |-| lxml [4.9.2] -> No vulnerabilities found.
    SUMMARY
    ...

Expected behavior The version number of the currently installed packages should be taken instead of the minimum required version specified in the requirements.txt file.

Desktop (please complete the following information):

pyscan version v0.1.3

aswinnnn commented 1 year ago

This is due to the order of precendence set for a source file. The tool parses the requirements.txt including the version specified so as to only rely on one source of information. And when a version is not detected in the source file, it defaults to pip.

aswinnnn commented 1 year ago

However i see the case. One of the reasons it does not directly get anything from pip is due to wether the user has installed the package or not. For example, it could be a git cloned repo and the user might be trying to find vulns before installing the specific versions mentioned in the requirements.txt. Therefore instead there could be an option for the user for the user to decide wether they want the mentioned versions or the ones installed with pip.

anotherbridge commented 1 year ago

This behavior makes sense, in case the operator in the requirements.txt is ==, but in case it is an operator that allows for a set of results the version that is actually used or potentially be installed should be detected.

An option which of the versions a user would want sounds great. Maybe this could be given in form of scanning either already installed packages and alternatively by making a pre-check of what would potentially be insdtalled.

aswinnnn commented 1 year ago

True, pyscan parses already parses different comparators so it shouldn't be a problem having a version >= or <=. The problem is not all packages use the same versioning notation. Most use major.minor.patch but some just use major.minor or major-alpha/beta. Accounting for all kinds of version types and enumerating potential ones would be a kind of hell. I wanna look into it more though, wonder if you have any ideas

aswinnnn commented 1 year ago

Its not unreasonable to have that capability for most popular ones like SemVer but you'd have to do checks such as finding the latest version released (We had an API for that and you already know how that went), and checking wether each potential version we come up with exists within the package.

anotherbridge commented 1 year ago

Hey @aswinnnn, I have a few suggestions.

In case that we want to just use the latest version I think everything could stay as it is, yet if a user wants to assess on the currently installed version in their environment one could grep from something like a pip freeze. I'm not sure whether it's possible to get that information in rust, but one could alternatively integrate that in the __main__.py as:

try: 
    from pip._internal.operations import freeze
except ImportError: 
    from pip.operations import freeze

packages = freeze.freeze()
for package in packages: 
    print(package)
    ...

Getting everything from pip freeze would have the advantage that we would also cover the dependencies of the requirements.

A different alternative which would give us this the same additional information about the depencies of the requirements would be to use the API https://pypi.org/pypi/<package>/json instead of https://api.deps.dev/v3alpha/systems/pypi/packages/<package>.

Having the package version that is desired we could just give that in the version parameter of the body of the query to the OSV API. What do you think about that?

aswinnnn commented 1 year ago

Thanks for the suggestion, it's valuable. If I understand you correctly, I think implementing a prompt for when a dependency version conflict is seen isn't a bad idea. For example, User has requests >= 2.9.2 in requirements.txt, User has requests installed with the version 2.20.2, There's a version on the internet that exceeds both of them. In this case, Pyscan can prompt the user for which version they would like to use. As to not interfere with CIs and other tools, there can be flags set (--pip, --source, --latest) which will default to the option provided.

anotherbridge commented 1 year ago

That sounds like a great idea!

aswinnnn commented 1 year ago

Also, there won't be a problem getting pip freeze info from Rust, in fact it's easier. I think having the pip data right before pyscan starts parsing dependencies, won't be a bad idea, would be a nice optimization as well.

aswinnnn commented 1 year ago

A slight change to the above plan, prompting every time a different version is detected between sources would be annoying, so pyscan will default to the version provided in the requirements.txt while still giving users the option to default to --pip or --pypi. This will put the burden of providing the right version on the user for sure, but I think there can be a small alert (display) when a different version is detected between requirements.txt and pip. There are more cases where a user wants to check against the provided versions in config files than the installed one. I think the fact that we provide the ability to change where the version is retrieved from should be enough for now.

aswinnnn commented 1 year ago

Hey, the update will soon be at pypi and cargo, you can test the fallbacks out yourself. Again, appreciate all that you have done so far, thanks!

anotherbridge commented 1 year ago

Awesome, thanks a lot. I will test it and let you know if anythings pops up.