Closed wilderrodrigues closed 3 years ago
At the moment is not possible to use the latest PyTorch with the latest TorchVision. The latter should be pinned to version 0.9.0.
I will update the code to support latest pytorch version.
The result seems a little weird. Could you provide more details to reimplement the result?
Hi @dddzg ,
Thanks for the reply, much appreciated.
It's working now. ;) It was a mistake on my side. The source-code is not that different, I just refactored it a bit and added unit tests. The problem was that for the checkpoint loader I had a separate unit test. So, the order is not guaranteed and the checkpoint was not loaded in time for the inference to happen.
To fix it, I moved the model building and checkpoint loading phase to a setUp
function in my unit test. The results are below:
Custom image with 10 hand-engineered patches
Kittens image with 10 hand-engineered patches
Kittens image with 10 random generated patches
Custom image with 10 random generated patches
As you can see, it works. However, we need some improvements when it comes to random-patches. The boxes are not good. The authors of this paper claimed to have improved it using region proposal via selective search: https://arxiv.org/pdf/2106.04550.pdf. No pre-trained models yet, though.
Thanks again and congrats on the good work.
Hi, @wilderrodrigues . The boxes are not good with random cropped patches. Because the task is just the pre-text task. We don't really care about the accuracy of the pre-text task. During pre-training, we freeze the CNN backbone( to preserve the CNN discrimation), so it is reasonable the boxes are not that good. As far as we observe, you can pre-train the CNN backbone together with transformers, if you only care about the accuracy of the boxes. I guess, it will improve a lot.
Hi @dddzg ,
Thanks for the extra info. We will probably try to fine-tune it and see how it behaves. Quick question: during fine-tuning I can change the number of queries / patches, right? Because now it's upper bounded to 100 and 10, respectively. For instance, when trying the fine-tuned COCO checkpoint you made available, on my custom image, I got this:
I used 10 random patches. So, the result is reasonably good. I would expect that with more patches we could find more objects.
Will keep you posted on my experiments / changes to the code.
Thanks again.
Yeah, just checked and I will fine tune and change the queries and patches. :)
parser.add_argument('--num_queries', default=100, type=int, help="Number of query slots")
parser.add_argument('--num_patches', default=10, type=int, help='number of query patches')
Need to convert my dataset to the COCO format first.
Hi there,
I'm currently experimenting with some Few/One/Zero-Shot for object detection and classification. For one of the tasks, your paper has been experimented with.
Unfortunately, I haven't been able to reproduce your results with the pre-trained models you have made available. I also noticed that the inference code you made available does not work out of the box. To support my points, here some details:
Results:
Patches
Detections
Patches
Detections
Hardware used
Yeah, I tried with both CPU and a CUDA compliant device.
Are you sure you have uploaded the rights checkpoint files?
Thanks in advance and looking to hear from you.