Cannot reproduce the author's results with the pre-trained models

wilderrodrigues commented 3 years ago

Hi there,

I'm currently experimenting with some Few/One/Zero-Shot for object detection and classification. For one of the tasks, your paper has been experimented with.

Unfortunately, I haven't been able to reproduce your results with the pre-trained models you have made available. I also noticed that the inference code you made available does not work out of the box. To support my points, here some details:

At the moment is not possible to use the latest PyTorch with the latest TorchVision. The latter should be pinned to version 0.9.0.
For the ImageNet pre-trained model
- In your code samples, you use 6 patches only, but the model has been trained with the default 100 queries and 10 patches. The README file needs adjustments

Results:

ImageNet pre-trained model (I duplicated some patches to make sure I had 10, same kittens image used)

Patches

Detections

COCO pre-trained model (custom image used)

Patches

Detections

Hardware used

MacBook Air M1
NVIDIA GeForce RTX2080i

Yeah, I tried with both CPU and a CUDA compliant device.

Are you sure you have uploaded the rights checkpoint files?

Thanks in advance and looking to hear from you.

dddzg commented 3 years ago

At the moment is not possible to use the latest PyTorch with the latest TorchVision. The latter should be pinned to version 0.9.0.

I will update the code to support latest pytorch version.

The result seems a little weird. Could you provide more details to reimplement the result?

dddzg commented 3 years ago

Could you get the same result as our provided notebook?

wilderrodrigues commented 3 years ago

Hi @dddzg ,

Thanks for the reply, much appreciated.

It's working now. ;) It was a mistake on my side. The source-code is not that different, I just refactored it a bit and added unit tests. The problem was that for the checkpoint loader I had a separate unit test. So, the order is not guaranteed and the checkpoint was not loaded in time for the inference to happen.

To fix it, I moved the model building and checkpoint loading phase to a setUp function in my unit test. The results are below:

Custom image with 10 hand-engineered patches

Kittens image with 10 hand-engineered patches

Kittens image with 10 random generated patches

Custom image with 10 random generated patches

As you can see, it works. However, we need some improvements when it comes to random-patches. The boxes are not good. The authors of this paper claimed to have improved it using region proposal via selective search: https://arxiv.org/pdf/2106.04550.pdf. No pre-trained models yet, though.

Thanks again and congrats on the good work.

dddzg commented 3 years ago

Hi, @wilderrodrigues . The boxes are not good with random cropped patches. Because the task is just the pre-text task. We don't really care about the accuracy of the pre-text task. During pre-training, we freeze the CNN backbone( to preserve the CNN discrimation), so it is reasonable the boxes are not that good. As far as we observe, you can pre-train the CNN backbone together with transformers, if you only care about the accuracy of the boxes. I guess, it will improve a lot.

wilderrodrigues commented 3 years ago

Hi @dddzg ,

Thanks for the extra info. We will probably try to fine-tune it and see how it behaves. Quick question: during fine-tuning I can change the number of queries / patches, right? Because now it's upper bounded to 100 and 10, respectively. For instance, when trying the fine-tuned COCO checkpoint you made available, on my custom image, I got this:

I used 10 random patches. So, the result is reasonably good. I would expect that with more patches we could find more objects.

Will keep you posted on my experiments / changes to the code.

Thanks again.

wilderrodrigues commented 3 years ago

Yeah, just checked and I will fine tune and change the queries and patches. :)

    parser.add_argument('--num_queries', default=100, type=int, help="Number of query slots")
    parser.add_argument('--num_patches', default=10, type=int, help='number of query patches')

Need to convert my dataset to the COCO format first.

dddzg / up-detr

Cannot reproduce the author's results with the pre-trained models #9