Closed camillol closed 5 months ago
@zsol thanks for taking a look! For context, my use case is a single repository containing many interdependent packages. Because of the interdependencies, it needs to be analyzed as a whole (e.g. I need information on the entire class hierarchy, which spans multiple packages).
AFAICT, the main user of calculate_module_and_package
(and the one I'm interested in) is FullyQualifiedNameProvider
. It works with FullRepoManager
, and the gen_cache
method receives only the repo root path and a list of file paths to process.
So, it's not feasible for me to have a separate repo root per package, and I don't see a way to feed the package information into FQNP.gen_cache given the current signature.
AFAICT, gen_cache
is meant to do I/O, and does I/O in other providers (e.g. TypeInferenceProvider). I assume the requirement for calculate_module_and_package
not to do I/O is for testing purposes. I could make a variant of this function and have FQNP call it, although it could cause problems down the road if something else decides to call calculate_module_and_package
and gets inconsistent results with FQNP.
So I think the best approach would be to have a global switch. Or perhaps an option on the FullRepoManager
, but in that case I'd still have to change the provider interface a bit to be able to wire it through.
Regarding the default behavior, I'd argue that the existing behavior of inferring the package from the file path relative to the repo root is also a heuristic, and for a repo that contains pyproject.toml files it is less likely to give the intended results than what I am proposing. But I'm fine with pyproject.toml parsing not being the default.
AFAICT, gen_cache is meant to do I/O, and does I/O in other providers (e.g. TypeInferenceProvider).
That's right. I propose to do the pyproject.toml
finding in FQNP's gen_cache
, and then pass in a root_path
to calculate_module_and_package
based on that.
My main concern is that calculate_module_and_package
is not just an internal implementation detail, it's exposed in libcst.helpers
and other code (outside of libcst) might be using it.
Regarding the default behavior, I'd argue that the existing behavior of inferring the package from the file path relative to the repo root is also a heuristic, and for a repo that contains pyproject.toml files it is less likely to give the intended results than what I am proposing. But I'm fine with pyproject.toml parsing not being the default.
Agreed that the existing behavior is also just a heuristic, but I'm not convinced that the pyproject.toml
-based heuristic is strictly better (or worse, it's just different), and so I don't think it warrants a backwards incompatible change. Let's make it configurable.
Sounds good. I don't see an existing API for this kind of global configuration. Is it ok if I keep it simple and just make it a global variable?
I don't see an existing API for this kind of global configuration.
Ugh, that's annoying :( Ideally this would be stored in the libcst config file (that's read here) and then passed into FullRepoManager
here to be passed into gen_cache
.
Maybe we should change the API for gen_cache
to accept a configuration object instead of just root_path
and timeout
. Would you be up for that?
Oh, I'm not using the CLI at all. I'm using LibCST as a library from another program, so that config file doesn't apply.
gen_cache
already has an optional argument that most implementations ignore (timeout
). I could generalize that as kwargs
, and add a keyword argument use_pyproject
, provided by FullRepoManager
.
Sounds good, then!
Attention: Patch coverage is 93.33333%
with 3 lines
in your changes missing coverage. Please review.
Project coverage is 91.15%. Comparing base (
9f54920
) to head (452868c
). Report is 17 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
libcst/helpers/module.py | 92.30% | 1 Missing :warning: |
libcst/metadata/base_provider.py | 75.00% | 1 Missing :warning: |
libcst/metadata/tests/test_metadata_wrapper.py | 80.00% | 1 Missing :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@zsol I think it's done, PTAL!
When determining the module name and package for a source file, LibCST currently takes the file name relative to the repository root. With this change, it will instead look for the closest directory containing a
pyproject.toml
file in the file's ancestors, and consider the file name relative to that directory.This gives correct module and package names when processing a repository containing multiple packages. For files that are not covered by a
pyproject.toml
, the search ends with the given repository root, thus retaining the old behavior.