Printing Report from `requirements.txt`

peterdsharpe commented 2 years ago

Hey there!

Had a potential feature to suggest:

I find myself often wanting to quickly make a Report based on the contents of a project's requirements.txt file. Maybe we could shorthand that by adding a new function to scooby with a call signature like scooby.report.from_requirements(filename) -> scooby.Report?

The output of such a function would be to create a scooby Report with all the packages mentioned in a requirements.txt file.

Here's the minimal code I've been using for this, type-hinted for readability:


import scooby
import re
from typing import List

with open("requirements.txt", "r") as f:
    packages: List[str] = f.read().split("\n")

def clean_package_name(s: str) -> str:
    """
    Takes in a line from a `requirements.txt` file and returns only the package name associated with that line.

    E.g.: "numpy >= 1.20" -> "numpy"
    """
    return re.split("<|>|=|!|~", s)[0].strip()

packages = [
    clean_package_name(p)
    for p in packages if p.strip() != ""
]
print(scooby.Report(core=packages, optional=[]))

LMK if interested and I can PR it.

prisae commented 2 years ago

I like the idea behind it a lot. However, I don't think we should reinvent the wheel and parse the requirements.txt for this (and have to worry about all the special cases), we should rather use pip directly (if there is a reasonable way). E.g., https://stackoverflow.com/questions/11147667/is-there-a-way-to-list-pip-dependencies-requirements).

peterdsharpe commented 2 years ago

Yep, I totally agree with the spirit not re-inventing the wheel!

However I'm unaware of any easy way to do that with pip, and a quick skim through that StackOverflow link didn't turn up anything applicable to modern pip versions (other than a comment listing a regex search with egrep, similar to what's above) - did any of the solutions jump out to you?

prisae commented 2 years ago

Here is a list of potential solutions: https://stackoverflow.com/a/67111193 -- Most seem not to work or are outdated. I think getting the dependencies right is a difficult task, and not one scooby should care.

But then, your question is sort of a different one. You don't want to get necessary the dependencies right, you just want to print the packages which are listed in the requirements. I still think that is interesting.

(For your own project, I would recommend the way specified in the readme: https://github.com/banesullivan/scooby#implementing-scooby-in-your-project).

@banesullivan , @akaszynski , what do you think?

prisae commented 2 years ago

I just had a brief look. So I am definitely not in favour of parsing a requirements.txt file in scooby. Have a look at the example of the requirements documentation: https://pip.pypa.io/en/stable/reference/requirements-file-format/#example

A requirements can, e.g., also have wheels. It can have other requirements files.

I could foresee many issues to be opened because scooby fails to parse a particular requirements file...

prisae commented 2 years ago

(It can also have the ., referring to setup.py; it can have GitHub repos; etc etc)

banesullivan commented 2 years ago

I really like the idea behind this and would support work towards it, but I am also concerned about all the different ways in which a dependency can be specified in a requirements.txt file - too many edge cases to handle directly in scooby.

Another concern to list: the package name does not have to match the import name, e.g. scikit-learn is imported as sklearn. Though, this can be worked around with pkg_resources.get_distribution: get_distribution("scikit-learn").version

This idea actually gave me another idea: what if we could generate the code for creating a report with the TrackedReport class in scooby? This should go in a separate issue

banesullivan / scooby

Printing Report from `requirements.txt` #77