I was trying to do zero-shot video-captioning on on mPLUG. I first downloaded the datasets via https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/mPLUG/data.tar. The VATEX data here seemed to be the same as the actual ones on the website. Then I run sh scripts/videocap_vatex_mplug_large.sh but I run into a few issues
(1) pip install git+git://github.com/j-min/language-evaluation@master fails: Presumably this is because the github link isn't correct but when I ran it with https://github.com/j-min/language-evaluation I still got the following traceback
File "", line 1, in
File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation__init__.py", line 15, in
from language_evaluation.coco_caption_py3.pycocoevalcap.eval import COCOEvalCap
File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation\coco_caption_py3\pycocoevalcap\eval.py", line 11, in
"METEOR": (Meteor(), "METEOR"),
File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation\coco_caption_py3\pycocoevalcap\meteor\meteor.py", line 20, in init
self.meteor_p = subprocess.Popen(self.meteor_cmd, \
File "C:\Users\shrey\anaconda3\lib\subprocess.py", line 951, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\shrey\anaconda3\lib\subprocess.py", line 1420, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
Exception ignored in: <function Meteor.del at 0x0000028948E36700>
Traceback (most recent call last):
File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation\coco_caption_py3\pycocoevalcap\meteor\meteor.py", line 78, in del
self.lock.acquire()
AttributeError: 'Meteor' object has no attribute 'lock'
(2) It seems that the videocap_mplugx.py doesn't exist. videocap_mplug.py does and I am guessing this is what was intended (the args match up nicely) but when I run with it, I get the following traceback (a module issue and a kubernetes issue). I am not sure if this is because I am not able to install language_evaluations correctly and not downloading coco or if I am running the wrong file. This is the full traceback for reference. The issue persists even when I do pip install ruamel.yaml. Thank you so much for all you help!
Traceback (most recent call last):
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:3223 (system error: 10049 - The requested address is not valid in its context.).
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:3223 (system error: 10049 - The requested address is not valid in its context.).
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5716) of binary: C:\Users\shrey\AppData\Local\Programs\Python\Python39\python.exe
Traceback (most recent call last):
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 196, in
main()
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 192, in main
launch(args)
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 177, in launch
run(args)
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\run.py", line 785, in run
elastic_launch(
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
videocap_mplug.py FAILED
Failures:
[1]:
time : 2023-05-18_00:22:07
host : university email address
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 20760)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time : 2023-05-18_00:22:07
host : university email address
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 12436)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time : 2023-05-18_00:22:07
host : university email address
rank : 3 (local_rank: 3)
exitcode : 1 (pid: 16444)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time : 2023-05-18_00:22:07
host : university email address
rank : 4 (local_rank: 4)
exitcode : 1 (pid: 25304)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time : 2023-05-18_00:22:07
host : university email address
rank : 5 (local_rank: 5)
exitcode : 1 (pid: 20196)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
time : 2023-05-18_00:22:07
host : university email address
rank : 6 (local_rank: 6)
exitcode : 1 (pid: 9708)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
time : 2023-05-18_00:22:07
host : university email address
rank : 7 (local_rank: 7)
exitcode : 1 (pid: 21216)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure):
[0]:
time : 2023-05-18_00:22:07
host : university email address
rank : 0 (local_rank: 0)
exitcode : 1 (pid: 5716)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
I was trying to do zero-shot video-captioning on on mPLUG. I first downloaded the datasets via https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/mPLUG/data.tar. The VATEX data here seemed to be the same as the actual ones on the website. Then I run sh scripts/videocap_vatex_mplug_large.sh but I run into a few issues
(1) pip install git+git://github.com/j-min/language-evaluation@master fails: Presumably this is because the github link isn't correct but when I ran it with https://github.com/j-min/language-evaluation I still got the following traceback
File "", line 1, in
File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation__init__.py", line 15, in
from language_evaluation.coco_caption_py3.pycocoevalcap.eval import COCOEvalCap
File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation\coco_caption_py3\pycocoevalcap\eval.py", line 11, in
"METEOR": (Meteor(), "METEOR"), File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation\coco_caption_py3\pycocoevalcap\meteor\meteor.py", line 20, in init self.meteor_p = subprocess.Popen(self.meteor_cmd, \ File "C:\Users\shrey\anaconda3\lib\subprocess.py", line 951, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\shrey\anaconda3\lib\subprocess.py", line 1420, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified
(2) It seems that the videocap_mplugx.py doesn't exist. videocap_mplug.py does and I am guessing this is what was intended (the args match up nicely) but when I run with it, I get the following traceback (a module issue and a kubernetes issue). I am not sure if this is because I am not able to install language_evaluations correctly and not downloading coco or if I am running the wrong file. This is the full traceback for reference. The issue persists even when I do pip install ruamel.yaml. Thank you so much for all you help!
Traceback (most recent call last): [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:3223 (system error: 10049 - The requested address is not valid in its context.). [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:3223 (system error: 10049 - The requested address is not valid in its context.). Traceback (most recent call last): File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
Traceback (most recent call last):
File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in
import ruamel_yaml as yaml
ModuleNotFoundError: No module named 'ruamel_yaml'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5716) of binary: C:\Users\shrey\AppData\Local\Programs\Python\Python39\python.exe
Traceback (most recent call last):
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 196, in
main()
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 192, in main
launch(args)
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 177, in launch
run(args)
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\run.py", line 785, in run
elastic_launch(
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
videocap_mplug.py FAILED
Failures: [1]: time : 2023-05-18_00:22:07 host : university email address rank : 1 (local_rank: 1) exitcode : 1 (pid: 20760) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2023-05-18_00:22:07 host : university email address rank : 2 (local_rank: 2) exitcode : 1 (pid: 12436) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2023-05-18_00:22:07 host : university email address rank : 3 (local_rank: 3) exitcode : 1 (pid: 16444) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [4]: time : 2023-05-18_00:22:07 host : university email address rank : 4 (local_rank: 4) exitcode : 1 (pid: 25304) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [5]: time : 2023-05-18_00:22:07 host : university email address rank : 5 (local_rank: 5) exitcode : 1 (pid: 20196) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [6]: time : 2023-05-18_00:22:07 host : university email address rank : 6 (local_rank: 6) exitcode : 1 (pid: 9708) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [7]: time : 2023-05-18_00:22:07 host : university email address rank : 7 (local_rank: 7) exitcode : 1 (pid: 21216) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
Root Cause (first observed failure): [0]: time : 2023-05-18_00:22:07 host : university email address rank : 0 (local_rank: 0) exitcode : 1 (pid: 5716) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html