Closed vadimkantorov closed 1 year ago
@vadimkantorov Thanks for your advice! The dirname has been changed. The decoder follows the code of DiffusionDet, such as variable names and function names. Finally, the mask kernel filters are now generated from bounding boxes. We have fixed the training and inference equations in the latest arxiv version. Also see #1 We are still working on this project for directly denoising the filters and will clean and revise the code in the future.
It's also a bit disappointing that according to your results in README, mask AP barely improves by going from 1step to 4steps :(
It's also a bit disappointing that according to your results in README, mask AP barely improves by going from 1step to 4steps :(
Yes, it is. And thus we are trying different denoising strategies and mask representations for further research.
Also, it would be nice to add to README pointers to model component source (especially Decoder), since it's not discussed much in the paper.
E.g. could you please comment on the inference path
ddim_sample
and thepreds, outputs_class, outputs_coord,outputs_kernel,mask_feat = self.model_predictions(backbone_feats, images_whwh, img, time_cond,self_cond, clip_x_start=clip_denoised)
call which seems to be the decoder call. Counterintuitively, it seems that the noisy boxes are stored in theimg
variable, right? And the dynamic mask kernels are produced in a deterministic way, right?Thanks!