cxliu0 / PET

[ICCV 2023] Point-Query Quadtree for Crowd Counting, Localization, and More
MIT License
60 stars 5 forks source link

differences in operation during training and inference #20

Closed KonstantinLihota closed 5 months ago

KonstantinLihota commented 6 months ago

Hello!

I'm studying your code on GitHub and came across a question that I can't fully grasp. I'm curious why predictions on the same image during training and inference might differ. It would be great if you could provide more details about the differences in operation during training and inference, especially in the context of the functions decoder_forward_dynamic and decoder_forward in the decoder, as well as points_queris_embed and points_queris_embed in BasePETCount.

I would appreciate your clarification!

Best regards, Konstantin

cxliu0 commented 6 months ago

During training, we generate the whole point-query quadtree, because we need to compute loss to supervise it. During testing, we dynamically construct the point-query quadtree, i.e., using sparse/dense point queries in sparse/dense regions. This operation aims to accelerate inference speed. Technically, one can use the same function in training to do inference.

KonstantinLihota commented 6 months ago

In this case, why can the results of working on the same patch with inferno and training be different?

cxliu0 commented 6 months ago

To be more specific, we use the split map (Figure 4 in the paper) to categorize sparse and dense regions, where sparse/dense point queries are responsible for object prediction in sparse/dense regions.

Regarding one can use the same function in training to do inference, I mean one can use sparse/dense point queries to do inference in the whole image, and use the split map to select the corresponding predictions in sparse and dense regions. This operation is relatively computationally expensive.

A more convenient way, which is presented in this repo, is to dynamically construct the point-query quadtree to do inference. This ensures that sparse/dense point queries only do inference in sparse/dense regions.