Open jiny419 opened 3 years ago
Hi @jiny419, thanks for using mmf,
Do you mind sharing the command you use to run? Tagging @KMarino to help with Krisp related issues.
Yes, I didn’t include the pytorch geometric dependencies because they’re system and cuda version dependent. See the installation instructions for pytorch geometric for specific instructions on how to do this on your system.
https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html
@ytsheng I runed "mmf_run config=./projects/krisp/configs/krisp/okvqa/train_val.yaml run_type=train_val dataset=okvqa model=krisp" with proper my project path. @KMarino I installed the dependencies like torch-sparse and torch-geometric appropriate to my cuda version, but I faced the error above, especially "torch.jit.frontend.NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults" in distributed.py of mmf.
I think the warning function of distributed.py was conflicted with get_layout function of torch-sparse and now I solved it ! Thank you !
@ytsheng I runed "mmf_run config=./projects/krisp/configs/krisp/okvqa/train_val.yaml run_type=train_val dataset=okvqa model=krisp" with proper my project path. @KMarino I installed the dependencies like torch-sparse and torch-geometric appropriate to my cuda version, but I faced the error above, especially "torch.jit.frontend.NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults" in distributed.py of mmf.
I think the warning function of distributed.py was conflicted with get_layout function of torch-sparse and now I solved it ! Thank you !
Could you elaborate the solution for the above conflict? Thank you!
@ytsheng I runed "mmf_run config=./projects/krisp/configs/krisp/okvqa/train_val.yaml run_type=train_val dataset=okvqa model=krisp" with proper my project path. @KMarino I installed the dependencies like torch-sparse and torch-geometric appropriate to my cuda version, but I faced the error above, especially "torch.jit.frontend.NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults" in distributed.py of mmf. I think the warning function of distributed.py was conflicted with get_layout function of torch-sparse and now I solved it ! Thank you !
Could you elaborate the solution for the above conflict? Thank you!
the warning function of distributed.py was conflicted with get_layout function of torch-sparsw, just comment the warning function
❓ Questions and Help
Hi !
I am running KRISP project code in mmf. But I discovered some errors.
1) torch-sparse module is missed in requirements.txt of krisp project. 2) when I installed torch-sparse suitable for cuda version 10.2, I got the error below
2021-06-22T16:32:10 | mmf.utils.configuration: Overriding option config to ./projects/krisp/configs/krisp/okvqa/train_val.yaml 2021-06-22T16:32:10 | mmf.utils.configuration: Overriding option run_type to train_val 2021-06-22T16:32:10 | mmf.utils.configuration: Overriding option datasets to okvqa 2021-06-22T16:32:10 | mmf.utils.configuration: Overriding option model to krisp 2021-06-22T16:32:14 | mmf.utils.distributed: XLA Mode:False 2021-06-22T16:32:14 | mmf.utils.distributed: Distributed Init (Rank 3): tcp://localhost:12572 2021-06-22T16:32:14 | mmf.utils.distributed: XLA Mode:False 2021-06-22T16:32:14 | mmf.utils.distributed: Distributed Init (Rank 4): tcp://localhost:12572 2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False 2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 1): tcp://localhost:12572 2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False 2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 0): tcp://localhost:12572 2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False 2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 2): tcp://localhost:12572 2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 2 2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False 2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 5): tcp://localhost:12572 2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 5 2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False 2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 7): tcp://localhost:12572 2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 7 2021-06-22T16:32:15 | mmf.utils.distributed: XLA Mode:False 2021-06-22T16:32:15 | mmf.utils.distributed: Distributed Init (Rank 6): tcp://localhost:12572 2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 6 2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 3 2021-06-22T16:32:15 | root: Added key: store_based_barrier_key:1 to store for rank: 4 2021-06-22T16:32:16 | root: Added key: store_based_barrier_key:1 to store for rank: 1 2021-06-22T16:32:16 | root: Added key: store_based_barrier_key:1 to store for rank: 0 2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 0 2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 2 2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 5 2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 3 2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 6 2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 7 2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 4 2021-06-22T16:32:16 | mmf.utils.distributed: Initialized Host 4eb3a36d858c as Rank 1 2021-06-22T16:32:21 | mmf: Logging to: ./save/train.log 2021-06-22T16:32:21 | mmf_cli.run: Namespace(config_override=None, local_rank=None, opts=['config=./projects/krisp/configs/krisp/okvqa/train_val.yaml', 'run_type=train_val', 'dataset=okvqa', 'model=krisp']) 2021-06-22T16:32:21 | mmf_cli.run: Torch version: 1.8.1+cu102 2021-06-22T16:32:21 | mmf.utils.general: CUDA Device 0 is: GeForce RTX 2080 Ti 2021-06-22T16:32:21 | mmf_cli.run: Using seed 21664516 2021-06-22T16:32:21 | mmf.trainers.mmf_trainer: Loading datasets okvqa/defaults/annotations/annotations/graph_vocab/graph_vocab.pth.tar /home/aimaster/.cache/torch/mmf/data 2021-06-22T16:32:27 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training 2021-06-22T16:32:27 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training 2021-06-22T16:32:27 | mmf.datasets.multi_datamodule: Multitasking disabled by default for single dataset training 2021-06-22T16:32:27 | mmf.trainers.mmf_trainer: Loading model Import error with KRISP dependencies. Fix dependencies if you want to use KRISP Traceback (most recent call last): File "/home/aimaster/anaconda3/envs/mmf/bin/mmf_run", line 33, in
sys.exit(load_entry_point('mmf', 'console_scripts', 'mmf_run')())
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf_cli/run.py", line 129, in run
nprocs=config.distributed.world_size,
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
while not context.join():
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:
-- Process 6 terminated with the following error: Traceback (most recent call last): File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, args) File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf_cli/run.py", line 66, in distributed_main main(configuration, init_distributed=True, predict=predict) File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf_cli/run.py", line 52, in main trainer.load() File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/trainers/mmf_trainer.py", line 42, in load super().load() File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/trainers/base_trainer.py", line 33, in load self.load_model() File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/trainers/mmf_trainer.py", line 96, in load_model self.model = build_model(attributes) File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/utils/build.py", line 87, in build_model model = model_class(config) File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/models/krisp.py", line 39, in init self.build() File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/models/krisp.py", line 75, in build from projects.krisp.graphnetwork_module import GraphNetworkModule File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/projects/krisp/graphnetwork_module.py", line 21, in
from torch_geometric.nn import BatchNorm, GCNConv, RGCNConv, SAGEConv
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch_geometric/init.py", line 5, in
import torch_geometric.data
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch_geometric/data/init.py", line 1, in
from .data import Data
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch_geometric/data/data.py", line 8, in
from torch_sparse import coalesce, SparseTensor
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch_sparse/init.py", line 36, in
from .storage import SparseStorage # noqa
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch_sparse/storage.py", line 21, in
class SparseStorage(object):
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/jit/_script.py", line 974, in script
_compile_and_register_class(obj, _rcb, qualified_name)
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/jit/_script.py", line 67, in _compile_and_register_class
torch._C._jit_script_class_compile(qualified_name, ast, defaults, rcb)
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/jit/_recursive.py", line 757, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/jit/_script.py", line 990, in script
qualified_name, ast, _rcb, get_default_args(obj)
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/jit/_recursive.py", line 757, in try_compile_fn
return torch.jit.script(fn, _rcb=rcb)
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/jit/_script.py", line 986, in script
ast = get_jit_def(obj, obj.name)
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/jit/frontend.py", line 271, in get_jit_def
return build_def(ctx, fn_def, type_line, def_name, self_name=self_name)
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/jit/frontend.py", line 293, in build_def
param_list = build_param_list(ctx, py_def.args, self_name)
File "/home/aimaster/anaconda3/envs/mmf/lib/python3.7/site-packages/torch/jit/frontend.py", line 316, in build_param_list
raise NotSupportedError(ctx_range, _vararg_kwarg_err)
torch.jit.frontend.NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults:
File "/home/aimaster/lab_storage/jinyeong/VQA/mmf/mmf/utils/distributed.py", line 340
def warn( args, **kwargs):