Closed Z-yolo closed 3 weeks ago
Your work is very inspiring, but I'm having some problems reproducing it, such as the weights of the VIT model in Flamingo.py “ViT-B-4”, since there doesn't seem to be a weight parameter for this version at the moment!
Yes. You have to modify this part in the open-flamingo library by adding a few lines of code. Please refer to the source code of "create_model_and_transforms" in the open-flamingo library.
Your work is very inspiring, but I'm having some problems reproducing it, such as the weights of the VIT model in Flamingo.py “ViT-B-4”, since there doesn't seem to be a weight parameter for this version at the moment!
Yes. You have to modify this part in the open-flamingo library by adding a few lines of code. Please refer to the source code of "create_model_and_transforms" in the open-flamingo library.
Thank you for your reply, what you mean is that I need to find the relevant create_model_and_transforms code in the open-flamingo library and make some changes to it so that “ViT-B-4” can use MAE PRETRAIN WEIGHT? because your readme.md configuration is to modify the MAE PRETRAIN WEIGHT PATH section.
Your work is very inspiring, but I'm having some problems reproducing it, such as the weights of the VIT model in Flamingo.py “ViT-B-4”, since there doesn't seem to be a weight parameter for this version at the moment!
Yes. You have to modify this part in the open-flamingo library by adding a few lines of code. Please refer to the source code of "create_model_and_transforms" in the open-flamingo library.
Thank you for your reply, what you mean is that I need to find the relevant create_model_and_transforms code in the open-flamingo library and make some changes to it so that “ViT-B-4” can use MAE PRETRAIN WEIGHT? because your readme.md configuration is to modify the MAE PRETRAIN WEIGHT PATH section.
Yes. Simply adding a configuration (layer_num, width, num_head, etc) that supports the MAE weight.
Thank you very much for every reply, I added the relevant configuration in Flamingo's library file create_model_and_transforms ---> open_clip.create_model_and_transforms, e.g. { “embed_dim": 768, “vision_cfg": { “image_size": [32, 128], { “layers": 12, “width": 768, “patch_size": 4 }, “text_cfg": { “context_length": 77, “vocab_size": 49408, ‘width’: 768, ‘patch_size’: 4 }, ‘text_cfg’: { ‘context_length’: 77, ‘vocab_size’: 49408, “width": 768, “heads": 12, “layers": 12 }
} But after many attempts, every time, before I can load the configuration for model creation, I already get a RuntimeError saying that this ViT-B-4 model configuration could not be found, and I really can't add it!
Could you please provide the details of the modification in your free time?
Thank you very much for every reply, I added the relevant configuration in Flamingo's library file create_model_and_transforms ---> open_clip.create_model_and_transforms, e.g. { “embed_dim": 768, “vision_cfg": { “image_size": [32, 128], { “layers": 12, “width": 768, “patch_size": 4 }, “text_cfg": { “context_length": 77, “vocab_size": 49408, ‘width’: 768, ‘patch_size’: 4 }, ‘text_cfg’: { ‘context_length’: 77, ‘vocab_size’: 49408, “width": 768, “heads": 12, “layers": 12 }
} But after many attempts, every time, before I can load the configuration for model creation, I already get a RuntimeError saying that this ViT-B-4 model configuration could not be found, and I really can't add it!
Could you please provide the details of the modification in your free time?
You should add the configuration in open_clip/src/open_clip/model_configs. (https://github.com/mlfoundations/open_clip/tree/main/src/open_clip/model_configs). Also refer to this file (https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/factory.py).
I have leaved the company and I could not provide the original file that I modified. But feel free to ask if you meet further problems.
Hi author, I've been bothering you before, thank you very much for your previous reply to help me run the training part of the code. However, I see that in the evaluation section, since I need to design the “JSON FILE FOR CHARACTER-WISE POSITION INFORMATION” and the " In-Context Pool", you mentioned in the readme that I can put the ” JSON FILE FOR CHARACTER-WISE POSITION INFORMATION” to None, but ”In-Context Pool" (i.e., a json file) by randomly sampling data from any target training set”, does it mean we can sample data as in-context pool by ourselves according to the JSON format you gave? Can this be used as an evaluation setup?
Hi author, I've been bothering you before, thank you very much for your previous reply to help me run the training part of the code. However, I see that in the evaluation section, since I need to design the “JSON FILE FOR CHARACTER-WISE POSITION INFORMATION” and the " In-Context Pool", you mentioned in the readme that I can put the ” JSON FILE FOR CHARACTER-WISE POSITION INFORMATION” to None, but ”In-Context Pool" (i.e., a json file) by randomly sampling data from any target training set”, does it mean we can sample data as in-context pool by ourselves according to the JSON format you gave? Can this be used as an evaluation setup?
Yes. Please refer to our evaluation settings in Table 1 & 2. For example, if testing on WordArt, you can simply sample 100 examples from the training set of WordArt. You can also sample out-of-domain data, but the performance may be lower (refer to Table 5 in the supplementary).
Your work is very inspiring, but I'm having some problems reproducing it, such as the weights of the VIT model in Flamingo.py “ViT-B-4”, since there doesn't seem to be a weight parameter for this version at the moment!