allenai / tango

Organize your experiments into discrete steps that can be cached and reused throughout the lifetime of your research project.
https://ai2-tango.readthedocs.io/
Apache License 2.0
526 stars 45 forks source link

Error when using pytorch dataloader with h5py when `num_workers` > 0 #542

Open Boltzmachine opened 1 year ago

Boltzmachine commented 1 year ago

πŸ› Describe the bug

Here is the minimal example to reproduce └── test β”œβ”€β”€ init.py β”œβ”€β”€ main.py └── test.jsonnet

In main.py

import torch
from tango import Step
import h5py

class Dataset(torch.utils.data.Dataset):
    def __init__(self) -> None:
        super().__init__()
        self.file = h5py.File("data.h5", "w")

    def __getitem__(self, idx):
        return None

@Step.register("main")
class Main(Step):
    def run(self):
        dataset = Dataset()
        loader = torch.utils.data.DataLoader(dataset, num_workers=2)
        iter(loader)

In test.jsonnet,

{
    steps: {
        type: "main"
    }
}

And __init__.py is empty

when I run tango run test/test.jsonnet -i test I got the error

[03/26/23 20:46:38] ERROR    Uncaught exception
Traceback (most recent call last):
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/site-packages/tango/step.py", line 468, in _run_with_work_dir
    result = self.run(**kwargs)
  File "/gpfs/gibbs/project/ying_rex/wq44/Yale-Brain-Graph/test/main.py", line 19, in run
    iter(loader)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 368, in __iter__
    return self._get_iterator()
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 314, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 927, in __init__
    w.start()
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
  File "/gpfs/gibbs/project/ying_rex/wq44/conda_envs/neuro/lib/python3.8/site-packages/h5py/_hl/base.py", line 368, in __getnewargs__
    raise TypeError("h5py objects cannot be pickled")
TypeError: h5py objects cannot be pickled

Versions

Python 3.8.13
ai2-tango==1.2.0
aiohttp==3.8.4
aiosignal==1.3.1
antlr4-python3-runtime==4.9.3
anyio==3.6.2
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
astor==0.8.1
asttokens==2.0.8
async-timeout==4.0.2
attrs==22.1.0
Babel==2.11.0
backcall==0.2.0
base58==2.1.1
beautifulsoup4==4.11.1
bids-validator==1.9.9
bleach==5.0.1
boto3==1.26.95
botocore==1.29.95
braceexpand==0.1.7
brotlipy @ file:///home/conda/feedstock_root/build_artifacts/brotlipy_1666764652625/work
cached-path==1.3.3
cachetools==5.3.0
certifi==2022.9.24
cffi @ file:///home/conda/feedstock_root/build_artifacts/cffi_1666754696558/work
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1661170624537/work
click==8.1.3
click-help-colors==0.9.1
contourpy==1.0.6
cryptography @ file:///home/conda/feedstock_root/build_artifacts/cryptography_1667422951827/work
cycler==0.11.0
datasets==2.10.1
debugpy==1.6.3
decorator==5.1.1
defusedxml==0.7.1
-e git+https://github.com/cvignac/DiGress.git@57df5f8c061d1839c8d101f9378484301061a49d#egg=DiGress
dill==0.3.6
docker-pycreds==0.4.0
docopt==0.6.2
einops==0.6.0
entrypoints==0.4
executing==1.1.0
fastjsonschema==2.16.2
fcd-torch==1.0.7
filelock==3.8.0
fonttools==4.38.0
formulaic==0.3.4
frozenlist==1.3.3
fsspec==2023.1.0
gitdb==4.0.9
GitPython==3.1.29
glob2==0.7
google-api-core==2.11.0
google-auth==2.16.2
google-cloud-core==2.3.2
google-cloud-storage==2.7.0
google-crc32c==1.5.0
google-resumable-media==2.4.1
googleapis-common-protos==1.58.0
googledrivedownloader==0.4
graphviz==0.20.1
h5py==3.7.0
huggingface-hub==0.10.1
Hydra==2.5
hydra-core==1.2.0
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work
importlib-metadata==5.0.0
importlib-resources==5.10.0
interface-meta==1.3.0
ipdb==0.13.9
ipykernel==6.17.1
ipython==8.5.0
ipython-genutils==0.2.0
ipywidgets==8.0.2
isodate==0.6.1
jedi==0.18.1
Jinja2==3.1.2
jmespath==1.0.1
joblib==1.2.0
json5==0.9.10
jsonschema==4.17.0
jupyter==1.0.0
jupyter-console==6.4.4
jupyter-server==1.23.1
jupyter_client==7.4.5
jupyter_core==5.0.0
jupyterlab==3.5.0
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.3
jupyterlab_server==2.16.3
kiwisolver==1.4.4
lightgbm==3.3.3
lightning-utilities==0.6.0.post0
load-confounds==0.12.0
lxml==4.9.2
markdown-it-py==2.2.0
MarkupSafe==2.1.1
matplotlib==3.6.3
matplotlib-inline==0.1.6
mdurl==0.1.2
mini-moses @ git+https://github.com/igor-krawczuk/mini-moses@1eda77fcc13045a93024835d403ff61634ac4b69
mistune==2.0.4
mkl-fft==1.3.1
mkl-random==1.2.2
mkl-service==2.4.0
more-itertools==9.1.0
multidict==6.0.4
multiprocess==0.70.14
nbclassic==0.4.8
nbclient==0.7.0
nbconvert==7.2.4
nbformat==5.7.0
nest-asyncio==1.5.6
networkx==2.8.8
nibabel==4.0.2
nilearn==0.9.0
notebook==6.5.2
notebook_shim==0.2.2
num2words==0.5.12
numpy @ file:///croot/numpy_and_numpy_base_1667233465264/work
omegaconf==2.2.3
overrides==7.3.1
packaging==21.3
pandas==1.5.1
pandocfilters==1.5.0
parso==0.8.3
pathtools==0.1.2
petname==2.6
pexpect==4.8.0
pickleshare==0.7.5
Pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1666920566244/work
pkgutil_resolve_name==1.3.10
platformdirs==2.5.4
plotly==5.11.0
prometheus-client==0.15.0
promise==2.3
prompt-toolkit==3.0.31
protobuf==4.21.9
psutil==5.9.4
ptyprocess==0.7.0
pudb==2022.1.3
pure-eval==0.2.2
py==1.11.0
pyarrow==11.0.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pybids==0.15.5
pycparser @ file:///home/conda/feedstock_root/build_artifacts/pycparser_1636257122734/work
Pygments==2.13.0
pyOpenSSL @ file:///home/conda/feedstock_root/build_artifacts/pyopenssl_1665350324128/work
pyparsing==3.0.9
pyrsistent==0.19.2
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work
python-dateutil==2.8.2
pytorch-lightning==1.9.2
pytz==2022.6
PyYAML==6.0
pyzmq==24.0.1
qtconsole==5.4.0
QtPy==2.3.0
rdflib==6.2.0
rdkit==2022.9.4
regex==2022.10.31
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1661872987712/work
responses==0.18.0
retry==0.9.2
rich==13.3.2
rjsonnet==0.5.2
rsa==4.9
s3transfer==0.6.0
sacremoses==0.0.53
scikit-learn==1.1.3
scipy==1.9.3
seaborn==0.12.1
Send2Trash==1.8.0
sentry-sdk==1.10.1
setproctitle==1.3.2
shortuuid==1.0.10
six @ file:///home/conda/feedstock_root/build_artifacts/six_1620240208055/work
smmap==5.0.0
sniffio==1.3.0
soupsieve==2.3.2.post1
SQLAlchemy==1.3.24
sqlitedict==2.1.0
stack-data==0.5.1
templateflow==0.8.1
tenacity==8.1.0
terminado==0.17.0
threadpoolctl==3.1.0
tinycss2==1.2.1
tokenizers==0.13.2
toml==0.10.2
tomli==2.0.1
torch==1.11.0
torch-geometric==2.1.0.post1
torch-scatter==2.0.9
torch-sparse==0.6.15
torchaudio==0.11.0
torchmetrics==0.11.0
torchvision==0.12.0
torchviz==0.0.2
tornado==6.2
tqdm==4.64.1
traitlets==5.4.0
transformers==4.16.2
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1665144421445/work
urllib3 @ file:///home/conda/feedstock_root/build_artifacts/urllib3_1658789158161/work
urwid==2.1.2
urwid-readline==0.13
wandb==0.13.5
wcwidth==0.2.5
webdataset==0.1.103
webencodings==0.5.1
websocket-client==1.4.2
widgetsnbextension==4.0.3
wrapt==1.14.1
xxhash==3.2.0
yacs==0.1.8
yarl==1.8.2
zipp==3.10.0
Boltzmachine commented 1 year ago

If I run the code without tango, i.e. use python command directly, I won't get any error