facebookresearch / video-long-term-feature-banks

Long-Term Feature Banks for Detailed Video Understanding
Apache License 2.0
373 stars 62 forks source link

How to fetch values from BlobReference object in header_helper.py? #41

Closed taosean closed 4 years ago

taosean commented 4 years ago

Hi, thank you for your great work.

Now I'm thinking to do some modification to the network structure. However, I have problems dealing with BlobReference object (representing feature map I suppose) in Caffe2.

From where I want to modify, I need to access the first column of BlobReference gpu_0/proposals_test or gpu_0/proposals_train so that I can know the batch index of the proposed bounding box.

Now I'm stuck at this step since Caffe2 adopts the computation graph design and cannot be indexed directly by [] operator like Numpy.

I know that workspace.FetchBlob doesn't work since the workspace.RunNet is not ran yet so no data could be fetched.

So, could you teach me how to access a specific BlobReference object and use the value access as condition to do following operations?

Thanks!

chaoyuaw commented 4 years ago

Hi @taosean , If the data is not forwarded, there's no blob that we can fetch. To obtain batch index of a box, you could feed them as additional inputs (i.e., modify data loader). But this also requires forwarding the model first.

Could you describe a bit more about what you would like to implement? If you want, I can brainstorm with you to see if I can think of any way to implement it.

taosean commented 4 years ago

@chaoyuaw Hi, thank you for your reply.

Actually, what I wanted to do is concatenating the corresponding global clip feature to each roi feature.

Suppose I have got N roi features (of shape: Nx2560x1x1x1, N is number of bounding boxes) and B global clip features (of shape: Bx2048x1x1x1, B is batch size) and the rois information (which clip the bounding is from, of shape Nx5).

How can I get the corresponding clip feature for each roi feature and concatenate them? (which will result in a Nx(2560+2048)x1x1x1 Blob)

To me, it seems I have to get the first column of rois information (shape:Nx5) and use the element of the column to get the corresponding row of global clip features (shape: Bx2048x1x1x1). And then, concatenate with corresponding roi features.

However, it seems during the stage of building network (No actual data), it is impossible to do the above mentioned operations, e.g. indexing with [] operator, etc...

I hope I have made myself clear about what I wanted to to and please teach me if you have good ideas.

Thank you!

chaoyuaw commented 4 years ago

Hi @taosean , Thanks for providing further information.

Maybe a way to implement this would be to construct an additional input M of shape NxB. The element (i, j) = 1 if box i belongs to example j, and 0 otherwise. Then we do "matrix multiplication": M times "global clip features of shape Bx2048" to get the resulting Nx2048 matrix. Finally we can concatenate this matrix to your roi features of shape Nx2560x1x1x1 to get Nx(2560+2048)x1x1x1 Blob.

Would this make sense to you?

taosean commented 4 years ago

@chaoyuaw Hi, thank you so much for you answer, I think it's a great idea.

By the way, I have a confusion about the code here. https://github.com/facebookresearch/video-long-term-feature-banks/blob/583c9a9dec89cdfdf867012fdf00bfcfb3bd56bb/lib/datasets/dataloader.py#L346-L360

It seems to be iterating through every char in a string list and creating Blobs for them. These operations seem odd and don't make sense to me. Could you explain why it is the case?

chaoyuaw commented 4 years ago

Hi @taosean , Glad that it works for you.

The code you're looking at is iterating through a list of strings, instead of through every char of a string. You may trace back the code to see where these keys are defined.

taosean commented 4 years ago

Hi @chaoyuaw , thank you for your reply.

I debugged the code and it is indeed iterating through every char of a string. The enqueue_blobs_names (L348) is a list of string and the following code is accessing it with 2 for loops (L357-L360), so it is iterating through every char in the list of string.

Could you give it a look?

Thank you!