Vision-CAIR / MiniGPT-4

Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
https://minigpt-4.github.io
BSD 3-Clause "New" or "Revised" License
25.43k stars 2.92k forks source link

in PRE_TRAIN(stage1)processing, “Loaded 0 records for train split from the dataset.” #485

Open waleyW opened 9 months ago

waleyW commented 9 months ago

THE information is below:

2024-02-05 13:09:57,036 [INFO] dataset_ratios not specified, datasets will be concatenated (map-style datasets) or chained (webdataset.DataPipeline). 2024-02-05 13:09:57,036 [INFO] Loaded 0 records for train split from the dataset. 2024-02-05 13:09:58,237 [INFO] number of trainable parameters: 3149824 2024-02-05 13:09:58,238 [INFO] Start training epoch 0, 5000 iters per inner epoch.

I confirmed that I successfully downloaded all the data sets, the data set paths are correct, and the relevant configurations have been set according to the README.md. I don't know why I get the info "Loaded 0 records for train split from the dataset"

Hurwitzzz commented 7 months ago

I also met this issue before, but then I found my dataset paths were wrong and managed to fix it. From my experience, please check the followings:

  1. If you have for example the "train" dir: .../dataset/gqa/train? If you have, the storage path in your yaml should be .../dataset/gqa, without the "train"
  2. Check the "image path" in the corresponding json file. If there are train2014/00001234.jpg in the json file, your dataset dir should be named in the same way. Good Luck