corgibytes / freshli-lib

A tool for collecting historical metrics about a project's dependencies
MIT License
17 stars 1 forks source link

Create Python Micro-Service for Python Resource Parsing #215

Open dan-hein opened 3 years ago

dan-hein commented 3 years ago

Problem

There are several issues posted related to the Python resource.txt parsing of package names and versions. In our current system, we are relying on regex to be able to accomplish this. While this handles many cases, there are a good handful of of situations that involve URLs, repositories, or local files for package names, or more unique versions that are harder for us to collect. If we were to continue to try to implement all of these nuances within Freshli, that could potentially further complicate how we parse these resources and slow down development in the future.

Proposition

Python already has a system for parsing these versions and files, and an extensive API to collect and query for this information utilizing the pkg_resource module from the setuptools library. To cut down on potential issues from interpreting and reimplementing this logic within Freshli, we could create a python micro-service that wraps around this logic and returns the needed information in a format that Freshli can consistently parse and utilize in its analysis. Setting this up as a micro-service and decoupling could also help us in cutting down on the integration debt that could occur if we were to have Freshli directly execute the python. In addition to being more modular, this would also help us easily upgrade the pip resource.txt standards simply by upgrading the setuptools dependency and its call within the micro-service.

It would also be my hope that this new interaction could provide us with opportunity to see how this solution might be able play out with languages that have a more complex dependency management system (e.g. Java with Maven or Gradle) whose behavior might not be the easiest to query or recreate.

As for the Python implementation of this idea, we could potentially use Flask to create the micro-service. I am not the most familiar with this library itself, but I am definitely open to other ideas! 😄

Related Issues

163

160

118

There may be other related issues, but as far as I am aware these issue would be directly impacted by the proposed change.

dan-hein commented 3 years ago
from pkg_resources import parse_requirements

with open(r'/Users/danhein/PycharmProjects/spaCy/requirements.txt') as f:
    install_reqs = list(parse_requirements(f.readlines()))

This creates a generator that generates Requirements objects. These objects have all of the details we need for Freshli.