Hello,
I am currently working on a project where both DataLab and datasets are subdependencies.
I noticed that I cannot import both libraries, as they both register FileSystems in fsspec, expecting the FileSystems not being registered before.
Versions
datalabs==0.4.15
datasets==2.12.0
Replication
import datasets
import datalabs
Error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Bened\anaconda3\envs\ner-eval-dashboard2\lib\site-packages\datalabs\__init__.py", line 28, in <module>
from datalabs.arrow_dataset import concatenate_datasets, Dataset
File "C:\Users\Bened\anaconda3\envs\ner-eval-dashboard2\lib\site-packages\datalabs\arrow_dataset.py", line 60, in <module>
from datalabs.arrow_writer import ArrowWriter, OptimizedTypedSequence
File "C:\Users\Bened\anaconda3\envs\ner-eval-dashboard2\lib\site-packages\datalabs\arrow_writer.py", line 28, in <module>
from datalabs.features import (
File "C:\Users\Bened\anaconda3\envs\ner-eval-dashboard2\lib\site-packages\datalabs\features\__init__.py", line 2, in <module>
from datalabs.features.audio import Audio
File "C:\Users\Bened\anaconda3\envs\ner-eval-dashboard2\lib\site-packages\datalabs\features\audio.py", line 21, in <module>
from datalabs.utils.streaming_download_manager import xopen
File "C:\Users\Bened\anaconda3\envs\ner-eval-dashboard2\lib\site-packages\datalabs\utils\streaming_download_manager.py", line 16, in <module>
from datalabs.filesystems import COMPRESSION_FILESYSTEMS
File "C:\Users\Bened\anaconda3\envs\ner-eval-dashboard2\lib\site-packages\datalabs\filesystems\__init__.py", line 37, in <module>
fsspec.register_implementation(fs_class.protocol, fs_class)
File "C:\Users\Bened\anaconda3\envs\ner-eval-dashboard2\lib\site-packages\fsspec\registry.py", line 51, in register_implementation
raise ValueError(
ValueError: Name (bz2) already in the registry and clobber is False
Possible Solution
I think as simple solution would be to just set clobber=True in https://github.com/ExpressAI/DataLab/blob/main/datalabs/filesystems/__init__.py#L37. This allows the register to discard previous registrations.
This should work, as the datalabs FileSystems are copies of the datasets FileSystems. However, I don't know if it is guaranteed to be compatible with other libraries that might use the same protocols.
I am linking the symmetric issue on datasets as ideally the issue is solved in both libraries the same way. Otherwise, it could lead to different behaviors depending on which library gets imported first.
Hello, I am currently working on a project where both DataLab and datasets are subdependencies. I noticed that I cannot import both libraries, as they both register FileSystems in
fsspec
, expecting the FileSystems not being registered before.Versions
Replication
Error
Possible Solution
I think as simple solution would be to just set
clobber=True
in https://github.com/ExpressAI/DataLab/blob/main/datalabs/filesystems/__init__.py#L37. This allows the register to discard previous registrations. This should work, as the datalabs FileSystems are copies of the datasets FileSystems. However, I don't know if it is guaranteed to be compatible with other libraries that might use the same protocols.I am linking the symmetric issue on datasets as ideally the issue is solved in both libraries the same way. Otherwise, it could lead to different behaviors depending on which library gets imported first.