SoftwareUnderstanding / inspect4py

Static code analysis package for Python repositories
https://inspect4py.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
28 stars 10 forks source link

Create automatically the requirements.txt #28

Closed rosafilgueira closed 3 years ago

rosafilgueira commented 3 years ago

Using the module graph dependencies, we could use it to generate which are the modules that we need to install and build the requirements.txt automatically.

rosafilgueira commented 3 years ago

We could use pydeps2requirements (I have to update it -locally -- to fix a small error ) for that ... (and possibly improve it)

Note: It uses the output of pydeps (the module dependency tool discuses in #26)

TEST 1: pydeps test/code_inspector/src/ --max-bacon=0 --show-raw-deps --nodot --noshow | python pydeps2requirements.py

astor # from: src.staticfg.model cdmcfparser # from: src.code_inspector click # from: src.code_inspector colorama # from: click._compat docstring_parser # from: src.code_inspector graphviz # from: src.staticfg.model

TEST 2: pydeps /Users/rosafilgueira/HW-Work/Research/somef/src/somef --max-bacon=0 --show-raw-deps --nodot --noshow | python pydeps2requirements.py

IPython # from: ipykernel.connect, ipykernel.displayhook, ipykernel.e... OpenSSL # from: urllib3.contrib.pyopenssl PIL # from: ipykernel.pylab.config, matplotlib.backend_bases, mat... PyQt5 # from: IPython.external.qt_loaders, PIL.ImageQt, pandas.io.c... _cffi_backend # from: cffi.api, cffi.commontypes, cffi.verifier _distutils_hack # from: setuptools _pytest # from: pytest _scandir # from: scandir appnope # from: ipykernel.eventloops asn1crypto # from: cryptography.hazmat.backends.openssl.backend, cryptog... atomicwrites # from: _pytest.assertion.rewrite attr # from: _pytest._code.code, _pytest.fixtures, pytest.main, ... backcall # from: IPython.core.events bcrypt # from: paramiko.ed25519key, paramiko.pkey botocore # from: pandas.io.s3 bs4 # from: pandas.io.html, somef.cli, somef.createExcerpts, soup... certifi # from: requests.certs cffi # from: PIL.Image, PIL.PyAccess, scipy._lib._ccallback, zmq.u... chardet # from: bs4.dammit, html5lib._inputstream, pygments.lexer, re... click # from: somef.main click_option_group # from: somef.main colorama # from: IPython.terminal.interactiveshell, click._compat, jed... cryptography # from: OpenSSL.SSL, OpenSSL._util, OpenSSL.crypto, paramiko.... cycler # from: matplotlib.pyplot, matplotlib.rcsetup dateutil # from: ipyparallel.util, jupyter_client.jsonutil, jupyter_cl... decorator # from: IPython.core.formatters, IPython.core.history, IPytho... defusedxml # from: nbconvert.filters.strings dill # from: ipykernel.pickleutil, ipyparallel.client._joblib, ipy... funcsigs # from: _pytest.compat html5lib # from: bs4.builder._html5lib, rdflib.term idna # from: cryptography.x509.general_name, jsonschema._format, r... importlib_metadata # from: jsonschema, markdown.util ipykernel # from: IPython, IPython.core.display, IPython.core.pylabtool... ipyparallel # from: ipykernel.pickleutil, ipykernel.zmqshell ipython_genutils # from: ipykernel.comm.manager, ipykernel.connect, ipykernel.... ipywidgets # from: tqdm.notebook isodate # from: rdflib.term jedi # from: IPython.core.completer jinja2 # from: nbconvert.exporters.html, nbconvert.exporters.templat... joblib # from: ipyparallel.client._joblib, ipyparallel.client.view, ... jsonschema # from: nbformat.validator jupyter_client # from: IPython.kernel, ipykernel.connect, ipykernel.displayh... jupyter_core # from: ipykernel.kernelapp, ipykernel.zmqshell, jupyter_clie... kiwisolver # from: matplotlib._layoutbox lxml # from: bs4.builder._lxml, html5lib.treebuilders.etree_lxml, ... markdown # from: somef.cli, somef.createExcerpts markupsafe # from: jinja2, jinja2.Environment, jinja2.asyncsupport, jinj... matplotlib # from: IPython.core.pylabtools, ipykernel.pylab.backend_inli... mistune # from: nbconvert.filters.markdown_mistune more_itertools # from: _pytest.python_api mpi4py # from: ipyparallel.engine.engine nacl # from: paramiko.ed25519key nbconvert # from: IPython.utils.io, notebook.services.contents.filemana... nbformat # from: IPython.core.interactiveshell, IPython.core.magics.ba... networkx # from: nltk.parse.DependencyGraph, nltk.parse.dependencygraph nltk # from: somef.configuration, textblob.base, textblob.blob, te... nose # from: IPython.external.decorators._decorators, IPython.exte... notebook # from: IPython.html, IPython.utils.io, nbconvert.preprocesso... numpy # from: IPython.core.formatters, IPython.core.magics.namespac... pandas # from: networkx.convert, networkx.convert_matrix, sklearn.ut... paramiko # from: zmq.ssh.tunnel parso # from: jedi.api, jedi.api.classes, jedi.api.completion, jedi... pathlib2 # from: PIL.Image, _pytest.pathlib, importlib_metadata._compa... pexpect # from: IPython.utils._process_posix, zmq.ssh.tunnel pickleshare # from: IPython.core.interactiveshell pluggy # from: _pytest._code.code, _pytest.config, _pytest.hookspec prometheus_client # from: notebook.base.handlers, notebook.metrics prompt_toolkit # from: IPython.terminal.debugger, IPython.terminal.interacti... psutil # from: joblib.externals.loky.backend.utils, joblib.externals... ptyprocess # from: pexpect.pty_spawn, terminado.management pvectorc # from: pyrsistent._pvector py # from: _pytest._code.code, _pytest._code.source, _pytest.ass... pycparser # from: cffi.cparser pygments # from: IPython.core.oinspect, IPython.lib.display, IPython.l... pygraphviz # from: networkx.drawing.nx_agraph pyrsistent # from: jsonschema._types pytest # from: _pytest.compat, _pytest.config, _pytest.fixtures, _py... pytz # from: pandas.core.arrays.datetimes, pandas.core.dtypes.dtyp... rdflib # from: somef.data_to_graph, somef.schema.software_schema regex # from: nltk.classify.textcat, nltk.tokenize.casual requests # from: jsonschema.validators, nltk.parse.corenlp, somef.cli,... scandir # from: pathlib2 scipy # from: networkx.algorithms.assortativity.correlation, networ... send2trash # from: notebook.services.contents.filemanager setuptools # from: cffi.ffiplatform, numpy.distutils.command.bdist_rpm, ... sip # from: IPython.external.qt_loaders six # from: OpenSSL.SSL, OpenSSL._util, OpenSSL.crypto, pytest.... sklearn # from: nltk.classify.scikitlearn, nltk.parse.transitionparser soupsieve # from: bs4.element sqlalchemy # from: pandas.io.sql terminado # from: notebook.terminal, notebook.terminal.handlers testpath # from: IPython.testing.plugin.ipdoctest, nbconvert.exporters... textblob # from: somef.header_analysis threadpoolctl # from: sklearn.cluster._kmeans tornado # from: ipykernel.ipkernel, ipykernel.kernelapp, ipykernel.ke... tqdm # from: nltk.util traitlets # from: IPython.core.alias, IPython.core.application, IPython... typing_extensions # from: sklearn.externals._arff urllib3 # from: requests, requests.adapters, requests.exceptions, req... wcwidth # from: IPython.terminal.ptutils, prompt_toolkit.utils webencodings # from: html5lib._inputstream yaml # from: networkx.readwrite.nx_yaml, nltk.data zipp # from: importlib_metadata zmq # from: ipykernel.eventloops, ipykernel.heartbeat, ipykernel....

dgarijo commented 3 years ago

Will we know which version should be used? Sometimes there are incompatibilities. I think this is a potential interesting problem. We should definitely have this issue here and revisit it later (e.g., when creating Dockerfiles that build and run a component).

rosafilgueira commented 3 years ago

Another tool that I found for this is pipreqs

pipreqs code_inspector --- it automatically generates a file called requirements.txt

requirements.txt generated by pipreqs: cdmcfparser==2.3.2 Click==7.0 matplotlib==3.1.1 astor==0.8.1 graphviz==0.16 docstring_parser==0.7.3 networkx==2.5.1

Original requirements. txt - done by us manually cdmcfparser==2.3.2 docstring_parser==0.7 astor graphviz click

Note: matplotlib==3.1.1 and networkx are used by code_visualization.py (and it is true that those libraries are needed to run this script.)

dgarijo commented 3 years ago

very cool. This one: https://github.com/jupyterhub/repo2docker is the one used to create Dockerfiles, so I guess it may use it underneath it.

rosafilgueira commented 3 years ago

Simple fully compatible pipreqs wrapper that supports python files and jupyter notebook pipreqsnb

Blog comparing dependency libraries tools

rosafilgueira commented 3 years ago

I run with code_inspector pigar (removing code_visualization.py) and this is the requirements.txt. (It is also very good!)

pigar -P code_inspector requirements_pigar.txt

Notes about Pigar:

Pigar is also compatible with Notebooks. Another interesting blog using pigar is this one

Pigar uses the method of parsing ast, rather than regular expression, and can easily extract the dependency library from the document test of exec / Eval parameters and document strings.

In addition, Pigar can well support the differences between different Python versions. For example, concurrent.futures It is the standard library of Python 3.2 +. In previous versions, the three-party Library Futures needs to be installed to use it. Pigar can distinguish effectively. (PS: pipreqs also supports this identification. For details, please refer to this integration: https://github.com/bndr/pipreqs/pull/80 )

-- Quick Test: Using Pigar and Pyreqs for Somef ---

pigar -P /Users/rosafilgueira/HW-Work/Research/somef somef_requirements_piga.txt

pyreqs /Users/rosafilgueira/HW-Work/Research/somef somef_requirements_pyreqs.txt

dgarijo commented 3 years ago

I think for this we should probably have a corpus of tricky ones and give it a try. I am sure they will all work with code inspector, but once you start digging we will find some issues.

rosafilgueira commented 3 years ago

Ok! I have integrated Pigar in code_inspector. For doing that, I have created a method called find_requirements in which we call Pigar (it will be very easy to call to another tool if we want to in the future).

The requirements are integrated in the final json file :) ! And I delete the intermediare "txt file" created by Pigar - but in the future we might want to keep it - since we might have to use it (to install the requirements and later try to execute automatically a repository).

Pigar has an option to search PyPI for the missing modules and filter some unnecessary modules, answering y\N. I have configured it to answer 'y' ... but we might want to change this in the future (answering to 'N'). It will be also easy to do it as an additional parameter.

Furthermore, I have created an additional argument (-r) to indicate if we want to find the requirements (by default it doesnt do it).

For example: python code_inspector.py -i /Users/rosafilgueira/HW-Work/Research/somef -r

Screen Shot 2021-06-08 at 17 48 14
dgarijo commented 3 years ago

Already supported by integrating pigar