Closed hunshcn closed 3 months ago
I am not sure if the complexity is worth it as the package imports may overlap at the top level. It could be possible to improve it, but I am not sure how realistic it is.
Let's imagine a case where you write code
import foo.bar
and the foo.bar
import is only provided by foo-bar-plugin
python package, but what you have in the gazelle_python.yaml
is data that tells gazelle that foo
maps to foo
python package, so gazelle will be happy and will add the wrong dependency instead of erroring here.
I think this is related to #1877, but might not be.
If somebody was to implement it, I think it should:
gazelle
plugin that can work with the smaller gazelle_python.yaml
should still work with the previous gazelle_python.yaml
.One possibility is that we only simplify the generated yaml. During the generation, we find the top bifurcation and then cut the branches. And this is well compatible.
for example
foo: foo
foo.bar : for-bar
foo.bar.abc: for-bar
foo.bar.xyz: for-bar
foo.bar.xyz.sub: for-bar
to
foo: foo
foo.bar : for-bar
One extra idea how we could simplify the gazelle_python.yaml
file by a lot is to only include the dependencies that the project is referencing directly. For example, if my project has a pyproject.toml
file and I only specify airflow
as the dependency in the project, gazelle_python.yaml
will be enormous because of all of the transitive airflow
dependencies that rules_python
knows about. However, the only things that we should actually include in the gazelle_python.yaml
are the things that the requirements.in
or pyproject.toml
mentions.
Such thing could be achieved in two ways:
pip
extension to return a list of whls that are the direct references of the project. This would mean that the pip.parse
should accept the requirements.in
or the pyproject.toml
and use that to generate a list of whl
targets that are the direct references. That way the modules_mapping
file will only create gazelle_python.yaml
entries that are the direct references.pyproject.toml
and/or requirements.in
file in order to filter out the unnecessary wheels.I personally think that 1. might be the better solution, but a little bit more involved, because we could also set different visibility
values in the hub
repo for transitive and non-transitive dependencies of the project to enforce that all of the direct dependencies of the project are in the requirements.in
and/or pyproject.toml
. However, this is something that may be a matter of taste and I am not sure if this is a good thing to enforce on everyone.
The second approach would be more doable as golang could be definitely taught how to parse pyproject.toml
and that would be something that the user can opt into.
That way we may be able to cut the file size by a lot without doing any more advanced techniques like what was suggested in the comment above.
One extra idea how we could simplify the
gazelle_python.yaml
file by a lot is to only include the dependencies that the project is referencing directly. For example, if my project has apyproject.toml
file and I only specifyairflow
as the dependency in the project,gazelle_python.yaml
will be enormous because of all of the transitiveairflow
dependencies thatrules_python
knows about. However, the only things that we should actually include in thegazelle_python.yaml
are the things that therequirements.in
orpyproject.toml
mentions.我们如何大幅简化gazelle_python.yaml
文件的一个额外想法是仅包含项目直接引用的依赖项。例如,如果我的项目有一个pyproject.toml
文件,并且我只指定airflow
作为项目中的依赖项,则gazelle_python.yaml
将非常庞大,因为所有传递性airflow
rules_python
知道的依赖项。但是,我们实际上应该在gazelle_python.yaml
中包含的唯一内容是requirements.in
或pyproject.toml
提到的内容。Such thing could be achieved in two ways:这样的事情可以通过两种方式实现:
- Modify the
pip
extension to return a list of whls that are the direct references of the project. This would mean that thepip.parse
should accept therequirements.in
or thepyproject.toml
and use that to generate a list ofwhl
targets that are the direct references. That way themodules_mapping
file will only creategazelle_python.yaml
entries that are the direct references.修改pip
扩展以返回作为项目直接引用的whls列表。这意味着pip.parse
应该接受requirements.in
或pyproject.toml
并使用它来生成whl
目标列表,这些目标是直接参考。这样,modules_mapping
文件将仅创建作为直接引用的gazelle_python.yaml
条目。- Modify the gazelle tooling to accept a
pyproject.toml
and/orrequirements.in
file in order to filter out the unnecessary wheels.修改瞪羚工具以接受pyproject.toml
和/或requirements.in
文件,以过滤掉不必要的轮子。I personally think that 1. might be the better solution, but a little bit more involved, because we could also set different
visibility
values in thehub
repo for transitive and non-transitive dependencies of the project to enforce that all of the direct dependencies of the project are in therequirements.in
and/orpyproject.toml
. However, this is something that may be a matter of taste and I am not sure if this is a good thing to enforce on everyone.我个人认为 1. 可能是更好的解决方案,但涉及更多一点,因为我们还可以在hub
存储库中为传递和非传递依赖项设置不同的visibility
值项目的所有直接依赖项都位于requirements.in
和/或pyproject.toml
中。然而,这可能是一个品味问题,我不确定这对每个人来说是否是一件好事。The second approach would be more doable as golang could be definitely taught how to parse
pyproject.toml
and that would be something that the user can opt into.第二种方法更可行,因为 golang 绝对可以被教导如何解析pyproject.toml
并且用户可以选择这样做。That way we may be able to cut the file size by a lot without doing any more advanced techniques like what was suggested in the comment above.这样我们就可以大幅削减文件大小,而无需使用任何更高级的技术,例如上面评论中建议的技术。
Maybe I missed something.
Whether it is 1 or 2, users need to ensure that all direct dependencies need to live in requirements.in, otherwise we will not be able to resolve this dependency. This requirement is reasonable, but it is difficult to achieve (most projects should not be able to do it because python does not have the same visibility restrictions as bzlmod).
WDYT @aignas
I think my previous suggestion about requirements.in
/pyproject.toml
may have been indeed not worth the effort.
I personally do not mind large files since they are machine generated and consumed and you can add the gazelle_python.yaml
to .gitattributes
for GH (or other git UI providers) to consider that file machine-generated and hide it from PRs.
If you really want to try out the option of reducing the size of it, please keep in mind #1971 as having more assumptions/business logic in this file could hinder the solution of that issue.
I have thought a little more about this and it may be the proposed would work well in cases to reduce merge conflicts. The smaller and simpler the file, the better.
@hunshcn, let me know if you have still time and willingness to add a new manifest format that people could choose. And if you decide to create a PR, feel free to add me as a reviewer.
This is good news. I will do this work later.
🚀 feature request
Relevant Rules
gazelle_python.yaml
Description
A clear and concise description of the problem or missing capability... `gazelle_python.yaml` will list all libs on packages ``` matplotlib.backend_tools: matplotlib matplotlib.backends: matplotlib matplotlib.backends.backend_agg: matplotlib matplotlib.backends.backend_cairo: matplotlib matplotlib.backends.backend_gtk3: matplotlib matplotlib.backends.backend_gtk3agg: matplotlib matplotlib.backends.backend_gtk3cairo: matplotlib matplotlib.backends.backend_gtk4: matplotlib matplotlib.backends.backend_gtk4agg: matplotlib matplotlib.backends.backend_gtk4cairo: matplotlib matplotlib.backends.backend_macosx: matplotlib matplotlib.backends.backend_mixed: matplotlib matplotlib.backends.backend_nbagg: matplotlib matplotlib.backends.backend_pdf: matplotlib matplotlib.backends.backend_pgf: matplotlib ... ``` I have a three lines only requirements.in ``` numpy requests seaborn ``` 20 packages requirements.txt but received a 110KB gazelle_python.yaml which has 2517 keys at `.manifest.modules_mapping` It is too big. ### Describe the solution you'd like If you have a solution in mind, please describe it. Can we simplify it and make `gazelle_python.yaml` have only top level package name? In my shallow cognition, such a long imports seems meaningless. I'd like to support it if I can. ### Describe alternatives you've considered Have you considered any alternative solutions or workarounds? no