LuoweiZhou / VLP

Vision-Language Pre-training for Image Captioning and Question Answering
Apache License 2.0
411 stars 62 forks source link

Unable to reproduce image features for COCO and CC #12

Closed darkmatter08 closed 4 years ago

darkmatter08 commented 4 years ago

Hi Luowei --

I'm unable to reproduce the image features that you've published here for COCO and CC. I've trained and evaluated the model using your provided features as well as my extracted features, on the VQA2 task (VQA2 uses COCO images). There is still an outstanding gap in performance. While you report 67.4, I can only achieve 64.3. This is a significant 3-point gap. I am wondering if others have encountered similar problem and how they have resolved it?

I've extracted my own features using the script you shared with me privately (slightly modified to resole dependency issues). Using the housebw/detectron image and your provided detectron checkpoint .pkl and config .yaml, I generate different features than yours. Comparing image-by-image, I have different values in the tensors/matricies. I also get different aggregate statisics (min, max, mean, variance) for features, image-by-image. This is the same situation for CC as well. I've also confirmed it is not a precision issue as well (float16 vs float32).

As it stands, I cannot replicate your results despite my best efforts to follow all your provided documentation, using the same environment, code, data dependencies, and source data.

I am attempting to use your SOTA model on a new dataset/task. Not being able to replicate your results is an impediment...

Thanks, Shawn

LuoweiZhou commented 4 years ago

Sorry for the delay. I have been traveling and will look into this issue early next week. Please stay tuned.

LuoweiZhou commented 4 years ago

@darkmatter08 If you have figured it out, could you share your experience in reproducing the feature files (do and don't)? I will sanitize the feature extraction code soon.

LuoweiZhou commented 4 years ago

Closing as problem is resolved.

darkmatter08 commented 4 years ago

To anyone attempting to reproduce: please verify md5sums of all files you download! @LuoweiZhou, I strongly encourage you to publish md5sums for every link you download to improve reproducibility.

Here are mine:

$ md5sum e2e_faster_rcnn_X-101-64x4d-FPN_2x.*
535a2f0f7a73948c7400ce864d4b8efa  e2e_faster_rcnn_X-101-64x4d-FPN_2x.pkl
cc3d540cee79506e1d5a22bec3aef5bf  e2e_faster_rcnn_X-101-64x4d-FPN_2x.yaml
LuoweiZhou commented 4 years ago

@darkmatter08 Thanks for sharing! Whoever has downloaded these two files, please verify the IDs.

The checkpoints used in our other repo GVD are from a different training (hence different) but named the same.

darkmatter08 commented 4 years ago

@LuoweiZhou best practice would be for you to share the md5sums of each of your files here. Can you compute the md5sums of your known good files and verify they are the same as mine? I can confirm I'm able to reproduce the features now.

LuoweiZhou commented 4 years ago

The ones I have are the ones in the VLP repo.

LuoweiZhou commented 4 years ago

The feature extraction code is now available here: https://github.com/LuoweiZhou/detectron-vlp