google-research / text-to-text-transfer-transformer

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
https://arxiv.org/abs/1910.10683
Apache License 2.0
6.18k stars 757 forks source link

Dependencies in `setup.py` have module conflicts. #1106

Open unsatisfying opened 1 year ago

unsatisfying commented 1 year ago

Background

Dependencies in setup.py have module conflicts.

Description

There are multiple dependencies mentioned in the setup.py file(the -> means the indirect dependencies):

tfds-nightly
mesh-tensorflow[transformer] -> tensorflow-datasets

To my knowledge, there seems to be a conflict between these two packages due to their shared module name, “tensorflow-datasets”.

During the pip installation process, both of these packages are installed simultaneously. However, pip does not isolate these two packages, but rather installs them both in the site-packages folder. This results in the same module (list below) from the latter installed package overwriting the one installed by the previous package. This overriding behavior, as one can imagine, may result in some functional errors.

# Note that these modules have different content in the two package
'tensorflow_datasets/core/as_dataframe.py'
'tensorflow_datasets/scripts/cli/build.py'
'tensorflow_datasets/core/dataset_builder_test.py'
'tensorflow_datasets/core/splits.py'
'tensorflow_datasets/core/file_adapters_test.py'
'tensorflow_datasets/core/file_adapters.py'
'tensorflow_datasets/core/dataset_builder.py'
'tensorflow_datasets/core/dataset_builders/huggingface_dataset_builder.py'
'tensorflow_datasets/version.py'

Steps to Reproduce

pip install xxx

Desired Change

Indeed, it is not an ideal behavior for modules to be overwritten, even if they are not actively used or if the overwritten module is the one being called. It introduces uncertainty and can cause issues in the long run, especially if there are changes or updates to the overwritten modules in future development. It is generally recommended to avoid such conflicts and ensure that only the necessary and compatible dependencies are declared in the requirements to maintain a stable and predictable environment for the project.

We believe that although this project can only modify direct dependencies and indirect dependencies are a black box, it is possible to add additional explanations rather than directly declaring both conflicting packages in the requirements.txt file. Or maybe you can check the dependencies and remove the redundant dependencies from the requirements.txt.

Adding extra explanations or documentation about the potential conflicts and the need to choose only one of the conflicting packages can help developers understand the issue and make informed decisions. Including a clear instruction or warning in the project’s documentation can guide users to choose the appropriate package based on their specific requirements.