ChenhongyiYang / QueryDet-PyTorch

[CVPR 2022 Oral] QueryDet: Cascaded Sparse Query for Accelerating High-Resolution Small Object Detection
MIT License
423 stars 46 forks source link

Could you do me a favour to explain the reason for using the `self.context` to construct `sparse indices`. #35

Closed furh20 closed 2 years ago

furh20 commented 2 years ago
def _make_sparse_tensor(self, query_logits, last_ys, last_xs, anchors, feature_value):
        if last_ys is None:
            N, _, qh, qw = query_logits.size()
            assert N == 1
            prob  = torch.sigmoid_(query_logits).view(-1)
            pidxs = torch.where(prob > self.score_th)[0]# .float()
            y = torch.floor_divide(pidxs, qw).int()
            x = torch.remainder(pidxs, qw).int()
        else:
            prob  = torch.sigmoid_(query_logits).view(-1)
            pidxs = prob > self.score_th
            y = last_ys[pidxs]
            x = last_xs[pidxs]

        if y.size(0) == 0:
            return None, None, None, None, None, None 

        _, fc, fh, fw = feature_value.shape

        ys, xs = [], []
        for i in range(2):
            for j in range(2):
                ys.append(y * 2 + i)
                xs.append(x * 2 + j)

        ys = torch.cat(ys, dim=0)
        xs = torch.cat(xs, dim=0)
        inds = (ys * fw + xs).long()

        sparse_ys = []
        sparse_xs = []

        for i in range(-1*self.context, self.context+1):
            for j in range(-1*self.context, self.context+1):
                sparse_ys.append(ys+i)
                sparse_xs.append(xs+j)

        sparse_ys = torch.cat(sparse_ys, dim=0)
        sparse_xs = torch.cat(sparse_xs, dim=0)

        good_idx = (sparse_ys >= 0) & (sparse_ys < fh) & (sparse_xs >= 0)  & (sparse_xs < fw)
        sparse_ys = sparse_ys[good_idx]
        sparse_xs = sparse_xs[good_idx]

        sparse_yx = torch.stack((sparse_ys, sparse_xs), dim=0).t()
        sparse_yx = torch.unique(sparse_yx, sorted=False, dim=0)

        sparse_ys = sparse_yx[:, 0]
        sparse_xs = sparse_yx[:, 1]

        sparse_inds = (sparse_ys * fw + sparse_xs).long()

        sparse_features = feature_value.view(fc, -1).transpose(0, 1)[sparse_inds].view(-1, fc)
        sparse_indices  = torch.stack((torch.zeros_like(sparse_ys), sparse_ys, sparse_xs), dim=0).t().contiguous()

        sparse_tensor = spconv.SparseConvTensor(sparse_features, sparse_indices, [fh, fw], 1)

        anchors = anchors.tensor.view(-1, self.anchor_num, 4)
        selected_anchors = anchors[inds].view(1, -1, 4)
        return sparse_tensor, ys, xs, inds, selected_anchors, sparse_indices.size(0)

Many thanks to the author for providing a good solution for small object detection, but I have a little question about run_qinfer in the source code. During inference, authors choose the locations whose predicted scores are larger than a threshold σ as queries. Then $q{l}^{0}$ will be mapped to its four nearest neighbors on $P{l−1}$ as key positions. The implementation of this part corresponds to the following operations in _make_sparse_tensor()

ys, xs = [], []
for i in range(2):
    for j in range(2):
        ys.append(y * 2 + i)
        xs.append(x * 2 + j)

ys = torch.cat(ys, dim=0)
xs = torch.cat(xs, dim=0)
inds = (ys * fw + xs).long()

But I don't understand why the following operations are required when constructing sparse indices. Why not directly use ys and xs to construct the sparse indices, and what is the point of self.context?

for i in range(-1*self.context, self.context+1):
    for j in range(-1*self.context, self.context+1):
        sparse_ys.append(ys+i)
        sparse_xs.append(xs+j)

sparse_ys = torch.cat(sparse_ys, dim=0)
sparse_xs = torch.cat(sparse_xs, dim=0)

good_idx = (sparse_ys >= 0) & (sparse_ys < fh) & (sparse_xs >= 0)  & (sparse_xs < fw)
sparse_ys = sparse_ys[good_idx]
sparse_xs = sparse_xs[good_idx]

I noticed that the author set cfg.MODEL.QUERY.CONTEXT = 2 in model/config.py, then according to the above code, 25 points are expanded around each point as query_key.

Could you do me a favour to explain the reason for using the self.context to construct sparse indices.

ChenhongyiYang commented 2 years ago

Hi, it's a good question. As its name suggests, self.context is used to provide context information for detection, which is critical for a detector's performance. An important aspect of self.context is that the context pixels are only used for computing sparse convolution and are not used as queries for the next stage. While we can also get more context by lowering the query score threshold, we would inevitably get more unnecessary queries, hence the efficiency would be harmed.

Hope my reply can answer your question.

furh20 commented 2 years ago

Hi, it's a good question. As its name suggests, self.context is used to provide context information for detection, which is critical for a detector's performance. An important aspect of self.context is that the context pixels are only used for computing sparse convolution and are not used as queries for the next stage. While we can also get more context by lowering the query score threshold, we would inevitably get more unnecessary queries, hence the efficiency would be harmed.

Hope my reply can answer your question.

Thank you for your reply, it solved my problem. CSQ is a great work. I am trying to replicate your method with MMDetection and apply it to other detectors.

23jisuper commented 11 months ago

嗨,这是个好问题。顾名思义,self.context 用于为检测提供上下文信息,这对检测器的性能至关重要。self.context 的一个重要方面是上下文像素仅用于计算稀疏卷积,而不用作下一阶段的查询。虽然我们也可以通过降低查询分数阈值来获得更多的上下文,但我们不可避免地会得到更多不必要的查询,因此效率会受到损害。 希望我的回复能回答你的问题。

感谢您的回复,它解决了我的问题。CSQ是一部伟大的作品。我正在尝试使用 MMDetection 复制您的方法并将其应用于其他检测器。

您好!请问一下您采用MMdetection将作者提出的创新模块进行封装了吗?