Open saksham-s opened 1 year ago
Did you receive any reply?
@saksham-s Hi, have you trained the model yourself? I am trying to reproduce the text-box grounding generation on COCO2014, but even if I trained for 200k iters, the results still do not follow the bbox. I do not know where it goes wrong, is it because the COCO2014 dataset is still not big enough?
@cats-food My model successfully follow the bbox, but the fid score can not descend to 5.8 as paper said. Are you still following this work?
@maluyazilation Thanks for the reply! I wonder what batch size and how many iters you trained on coco2014 when your model starts to follow the bbox?
I am not following the work recently since in my experiment my results do not even follow the bbox, i still do not know what's going on.
@cats-food Your reply is really quick, thank you. I don't know if my settings are suitable. For coco2014cd-ldm setting with bz64, bbox following happens at 30000 to 40000 iters. I also train stable-diffusion in flickr dataset with bz32, it seems to take longer, about 160,000 to 180,000 iters.
@cats-food If you are interested in reproducing this paper in the future, i'll be glad if we can communicate more in email, wechat or something else, thanks~ For coco-setting, i think there are still many details omitted in the paper, and i want to find them out. : ) email:xiaobo123@stu.xjtu.edu.cn
Thanks for sharing your setting, so i think my problem should be my batch size is too small. I only used 4.
Anyway, for now I am not planning to reproduce the paper, but I am happy to discuss more
@cats-food Ok, that's fine. Best wishes.
@cats-food Hello!Have you successfully reproduced it now? I am facing the same problem, my bz is 2, I have trained 200000 iters, the results do not even follow the bbox. Is it because the bz too small ?
Thanks for sharing your work which is very helpful and interesting. I wanted to ask if you could share the coco trained weights without any of the large scale training using the bigger datasets. I see you discuss the coco trained results in the paper but I could not find them in the Github repository.