Closed JeffMony closed 1 year ago
最终也生成了对应的模型,但是出现了上面的错误。
这个是不影响的哈,忽略即可。
忽略
谢谢啦
请问这个怎么解决的
跑到第三个epoch就断了
我也是跑到第三个epoch就断了
跑到第三个epoch就断了
你好,我看了一下我的配置文件里写的最大epoch就是 3,所以才会跑到第三个epoch之后就断了
跑到第三个epoch就断了
你好,我看了一下我的配置文件里写的最大epoch就是 3,所以才会跑到第三个epoch之后就断了
我配置文件最大不是3,我应该是python版本问题后面换了一个就好了
2023-02-23,06:08:33 | INFO | Rank 0 | Validation Result (epoch 3 @ 99 steps) | Valid Loss: 0.000000 | Image2Text Acc: 100.00 | Text2Image Acc: 100.00 | logit_scale: 4.605 | Valid Batch Size: 1 2023-02-23,06:08:40 | INFO | Rank 0 | Saved checkpoint ../clip_set/experiments/muge_finetune_vit-b-16_roberta-base_bs128_8gpu_poizon/checkpoints/epoch3.pt (epoch 3 @ 99 steps) (writing took 7.470757007598877 seconds) 2023-02-23,06:08:48 | INFO | Rank 0 | Saved checkpoint ../clip_set/experiments/muge_finetune_vit-b-16_roberta-base_bs128_8gpu_poizon/checkpoints/epoch_latest.pt (epoch 3 @ 99 steps) (writing took 7.439142227172852 seconds) Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.8/logging/handlers.py", line 1482, in _monitor record = self.dequeue(True) File "/usr/lib/python3.8/logging/handlers.py", line 1431, in dequeue return self.queue.get(block) File "/usr/lib/python3.8/multiprocessing/queues.py", line 97, in get res = self._recv_bytes() File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError