OFA-Sys / Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
MIT License
4.59k stars 474 forks source link

训练数据最后出现EOFError #62

Closed JeffMony closed 1 year ago

JeffMony commented 1 year ago

image 2023-02-23,06:08:33 | INFO | Rank 0 | Validation Result (epoch 3 @ 99 steps) | Valid Loss: 0.000000 | Image2Text Acc: 100.00 | Text2Image Acc: 100.00 | logit_scale: 4.605 | Valid Batch Size: 1 2023-02-23,06:08:40 | INFO | Rank 0 | Saved checkpoint ../clip_set/experiments/muge_finetune_vit-b-16_roberta-base_bs128_8gpu_poizon/checkpoints/epoch3.pt (epoch 3 @ 99 steps) (writing took 7.470757007598877 seconds) 2023-02-23,06:08:48 | INFO | Rank 0 | Saved checkpoint ../clip_set/experiments/muge_finetune_vit-b-16_roberta-base_bs128_8gpu_poizon/checkpoints/epoch_latest.pt (epoch 3 @ 99 steps) (writing took 7.439142227172852 seconds) Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/usr/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.8/logging/handlers.py", line 1482, in _monitor record = self.dequeue(True) File "/usr/lib/python3.8/logging/handlers.py", line 1431, in dequeue return self.queue.get(block) File "/usr/lib/python3.8/multiprocessing/queues.py", line 97, in get res = self._recv_bytes() File "/usr/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/usr/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/usr/lib/python3.8/multiprocessing/connection.py", line 383, in _recv raise EOFError EOFError

JeffMony commented 1 year ago

最终也生成了对应的模型,但是出现了上面的错误。 image

yangapku commented 1 year ago

这个是不影响的哈,忽略即可。

JeffMony commented 1 year ago

忽略

谢谢啦

bowenzc commented 8 months ago

请问这个怎么解决的

bowenzc commented 8 months ago

跑到第三个epoch就断了

meisa233 commented 6 months ago

我也是跑到第三个epoch就断了

meisa233 commented 6 months ago

跑到第三个epoch就断了

你好,我看了一下我的配置文件里写的最大epoch就是 3,所以才会跑到第三个epoch之后就断了

bowenzc commented 6 months ago

跑到第三个epoch就断了

你好,我看了一下我的配置文件里写的最大epoch就是 3,所以才会跑到第三个epoch之后就断了

我配置文件最大不是3,我应该是python版本问题后面换了一个就好了