arampacha / CLIP-rsicd

Apache License 2.0
212 stars 31 forks source link

Getting erro while Using this script to fine tune(Urgent) #33

Closed minakshimathpal closed 2 years ago

minakshimathpal commented 2 years ago

@arampacha I am getting this error while executing ImageTextDataset image Please help

arampacha commented 2 years ago

Hi! This error is most likely caused by empty filepaths list. The root folder should contain jsonl files named following certain pattern, for example "train.jsonl". Example files (with augmented captions) can be found in data folder. Thanks for the issue. I'll look at providing more informative error messages for this scenario

minakshimathpal commented 2 years ago

@arampacha thanks for the solution.....i have coco captions dataset in json format...is there any way to convert it into jsonl...i tried a script but couldn't fetch the captions..That script is writing images only.no captions..please help

arampacha commented 2 years ago

@minakshimathpal have a look at RSICD_data notebook. There is convert_dataset_json function which converts json to jsonl. You might need to adapt it to your case if some of the keys mismatch.

INF800 commented 2 years ago

Hi! This error is most likely caused by empty filepaths list. The root folder should contain jsonl files named following certain pattern, for example "train.jsonl". Example files (with augmented captions) can be found in data folder. Thanks for the issue. I'll look at providing more informative error messages for this scenario

Unfortuantely, it occurs even if data folder is not empty in my case (tried both with relative and absolute paths). Can't get around it. Reproduced the issue here in colab: https://colab.research.google.com/github/INF800/CLIP-rsicd/blob/master/nbs/Finetuning_CLIP_with_HF_and_jax.ipynb

INF800 commented 2 years ago

To reproduce, just run all the cell of above notebook. It will terminate with same error. Note the cell !ls {data_args.data_dir} which verifies that folder is not empty.

rsicd_images          textaug_train_rsicd.jsonl
textaug_test_rsicd.jsonl  textaug_valid_rsicd.jsonl
arampacha commented 2 years ago

Hi @INF800! Thanks for reproducer. I've just pushed a fix. Could you sync your fork and check if it works now?

INF800 commented 2 years ago

Sure. Thanks! What was the issue BTW.

arampacha commented 2 years ago

I've actually broke the dataset creation in stupid way when adding check for empty filepaths list in heist. Exhausted the generator at check lol. Should've postpone pushing till weekend and test it obviously, my bad. Thanks again for turning attention to this.

arampacha commented 2 years ago

Looks like now everything works as expected, so I'll close the issue. Feel free to ping me if anything