apple / ml-cvnets

CVNets: A library for training computer vision networks
https://apple.github.io/ml-cvnets
Other
1.76k stars 225 forks source link

About Dataset collate fn #58

Open YHYeooooong opened 1 year ago

YHYeooooong commented 1 year ago

Hello! thanks for sharing the great work!

I have some questions about the dataset.py code. I rewrite the imagenet.py in dataset/classification to make my own dataset.

I changed only the name of def and 'register_collated_fn', 'register_dataset'. I found that my collated_fn def name and 'register_collated_fn' are not the same, and the collated name in dataset.py line 45~47 and 'register_collate_fn' are different either. I wonder if those things call the other collate_fn (not collated_fn in my code) in the training and validation phase? if so, is called collated_fn exactly the same as imagenet.py collated_fn? or working as same as imagenet.py collated_fn? May it make any performance difference by using default collate_fn instead of imagenet_collate_fn?

here is my dataset.py CBIS-DDSM_4class_sampled.zip

and my yaml file is here 0707_mobilevits_real_defualt_lr0.0001_cosine_advanced_multiscale.zip

farzadab commented 1 year ago

Hi,

I'm really confused about what the question is here. Please say: 1) what you did (e.g. a partial code containing what is registered) 2) what you expected 3) what went wrong.

For example, I think you mentioned wanting to use your own collate_fn, but in your Yaml I don't see collate_fn_name_train (or the same for val/test) defined. Also note that imagenet.py forcibly sets these values.

May it make any performance difference by using default collate_fn instead of imagenet_collate_fn?

imagenet_collate_fn is designed so that it can remove corrupted images if they exist. The main objective for using it wasn't for it to be efficient here AFAIK.

YHYeooooong commented 1 year ago

Thanks for fast replying!

  1. What you did I changed the iamgenet.py code to make my dataset.py and it is attached on above

git1

I changed these parts from imagenet.py (I changed only the name. not the working code)

git2 git3 git4

In the second picture, actually, I changed this part, https://github.com/apple/ml-cvnets/blob/84d992f413e52c0468f86d23196efd9dad885e6f/data/datasets/classification/imagenet.py#L53

and changed part's collated_fn_name is not the same as the 3rd picture's register_collated_fn name.

  1. What you expected I think the difference in the collated_fn name in the 2nd and 3rd pictures may make my code can not find my collate_fn function in the 3rd picture. I think the default_collated_fn function may be used during the training and validation phase.

  2. What went wrong If the default_collated_fn or other collated_fn is used, did the code work the same as imagenet.py collated_fn? If not, can this change make a performance difference (the model using imagenet_collated_fn vs. the model using defualt_collated_fn)?