alibaba / AliceMind

ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
Apache License 2.0
1.97k stars 292 forks source link

Zero-Shot Video Captioning script issues #85

Open shreyaskar123 opened 1 year ago

shreyaskar123 commented 1 year ago

I was trying to do zero-shot video-captioning on on mPLUG. I first downloaded the datasets via https://alice-open.oss-cn-zhangjiakou.aliyuncs.com/mPLUG/data.tar. The VATEX data here seemed to be the same as the actual ones on the website. Then I run sh scripts/videocap_vatex_mplug_large.sh but I run into a few issues

(1) pip install git+git://github.com/j-min/language-evaluation@master fails: Presumably this is because the github link isn't correct but when I ran it with https://github.com/j-min/language-evaluation I still got the following traceback

File "", line 1, in File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation__init__.py", line 15, in from language_evaluation.coco_caption_py3.pycocoevalcap.eval import COCOEvalCap File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation\coco_caption_py3\pycocoevalcap\eval.py", line 11, in
"METEOR": (Meteor(), "METEOR"), File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation\coco_caption_py3\pycocoevalcap\meteor\meteor.py", line 20, in init self.meteor_p = subprocess.Popen(self.meteor_cmd, \ File "C:\Users\shrey\anaconda3\lib\subprocess.py", line 951, in init self._execute_child(args, executable, preexec_fn, close_fds, File "C:\Users\shrey\anaconda3\lib\subprocess.py", line 1420, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] The system cannot find the file specified

Exception ignored in: <function Meteor.del at 0x0000028948E36700> Traceback (most recent call last): File "C:\Users\shrey\anaconda3\lib\site-packages\language_evaluation\coco_caption_py3\pycocoevalcap\meteor\meteor.py", line 78, in del self.lock.acquire() AttributeError: 'Meteor' object has no attribute 'lock'

(2) It seems that the videocap_mplugx.py doesn't exist. videocap_mplug.py does and I am guessing this is what was intended (the args match up nicely) but when I run with it, I get the following traceback (a module issue and a kubernetes issue). I am not sure if this is because I am not able to install language_evaluations correctly and not downloading coco or if I am running the wrong file. This is the full traceback for reference. The issue persists even when I do pip install ruamel.yaml. Thank you so much for all you help!

Traceback (most recent call last): [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:3223 (system error: 10049 - The requested address is not valid in its context.). [W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:601] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:3223 (system error: 10049 - The requested address is not valid in its context.). Traceback (most recent call last): File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in import ruamel_yaml as yaml ModuleNotFoundError: No module named 'ruamel_yaml' Traceback (most recent call last): File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in import ruamel_yaml as yaml ModuleNotFoundError: No module named 'ruamel_yaml' Traceback (most recent call last): File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in import ruamel_yaml as yaml ModuleNotFoundError: No module named 'ruamel_yaml' Traceback (most recent call last): File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in import ruamel_yaml as yaml ModuleNotFoundError: No module named 'ruamel_yaml' Traceback (most recent call last): File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in import ruamel_yaml as yaml ModuleNotFoundError: No module named 'ruamel_yaml' Traceback (most recent call last): File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in Traceback (most recent call last): File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in import ruamel_yaml as yaml ModuleNotFoundError: No module named 'ruamel_yaml' import ruamel_yaml as yaml ModuleNotFoundError: No module named 'ruamel_yaml' Traceback (most recent call last): File "C:\Users\shrey\OneDrive\AliceMind\mPLUG\videocap_mplug.py", line 4, in import ruamel_yaml as yaml ModuleNotFoundError: No module named 'ruamel_yaml' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 5716) of binary: C:\Users\shrey\AppData\Local\Programs\Python\Python39\python.exe Traceback (most recent call last): File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 196, in main() File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 192, in main launch(args) File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launch.py", line 177, in launch run(args) File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\run.py", line 785, in run elastic_launch( File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launcher\api.py", line 134, in call return launch_agent(self._config, self._entrypoint, list(args)) File "C:\Users\shrey\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\distributed\launcher\api.py", line 250, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

videocap_mplug.py FAILED

Failures: [1]: time : 2023-05-18_00:22:07 host : university email address rank : 1 (local_rank: 1) exitcode : 1 (pid: 20760) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2023-05-18_00:22:07 host : university email address rank : 2 (local_rank: 2) exitcode : 1 (pid: 12436) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2023-05-18_00:22:07 host : university email address rank : 3 (local_rank: 3) exitcode : 1 (pid: 16444) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [4]: time : 2023-05-18_00:22:07 host : university email address rank : 4 (local_rank: 4) exitcode : 1 (pid: 25304) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [5]: time : 2023-05-18_00:22:07 host : university email address rank : 5 (local_rank: 5) exitcode : 1 (pid: 20196) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [6]: time : 2023-05-18_00:22:07 host : university email address rank : 6 (local_rank: 6) exitcode : 1 (pid: 9708) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [7]: time : 2023-05-18_00:22:07 host : university email address rank : 7 (local_rank: 7) exitcode : 1 (pid: 21216) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2023-05-18_00:22:07 host : university email address rank : 0 (local_rank: 0) exitcode : 1 (pid: 5716) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

kenhuang1964 commented 1 year ago

Hey @shreyaskar123! Did you end up figuring out how to resolve the issue?