facebookresearch / video-long-term-feature-banks

Long-Term Feature Banks for Detailed Video Understanding
Apache License 2.0
373 stars 62 forks source link

How to fetch blobs in create_model #36

Closed fanovo closed 5 years ago

fanovo commented 5 years ago

Hi, thanks for offering such great codes. I am new to caffe2 and I met a problem. I want to fetch the input blobs in the resnet_video.py/create_model, here are what I did: I put the workspace in model_build_video.py as an argument and transfer it to resnet_video.py/create_model. and then I added workspace.Fetchblob('labels{}'.format(suffix)) into resnet_video.py/create_model and ran the model. And then I got an error: RuntimeError: [enforce fail at pybind_state.cc:207] ws->HasBlob(name). Can't find blob: labels_test I read all the code flow and I think the blob name I entered in should be right. One reason I guess why this happen might be that I didn't add gpu prefix? Do you have any idea how to rightly write the codes? Thanks!

chaoyuaw commented 5 years ago

Hi @fanovo , Yes, you'll need to specify a GPU index. The following is one example on how to do this. https://github.com/facebookresearch/video-long-term-feature-banks/blob/master/lib/utils/metrics.py#L497

Thanks!

fanovo commented 5 years ago

Hi @fanovo , Yes, you'll need to specify a GPU index. The following is one example on how to do this. https://github.com/facebookresearch/video-long-term-feature-banks/blob/master/lib/utils/metrics.py#L497

Thanks!

But the codes wrote in resnetvideo.py will be distributed to every GPU, the problem is how to specify the gpu index? For instance, if I write: `workspace.FetchBlob('gpu{}/labels'.format(ind))` in resnet.py/create_model(), so every gpu will run this line? So if I specify ind as 0, all 8 GPUs will fetch the blob in gpu0 and 7 GPUs will not get the labels with respect to their inputs.

fanovo commented 5 years ago

Hi @fanovo , Yes, you'll need to specify a GPU index. The following is one example on how to do this. https://github.com/facebookresearch/video-long-term-feature-banks/blob/master/lib/utils/metrics.py#L497

Thanks!

Is there any way to get the GPU index in resnet_video.py? Thanks a lot~

fanovo commented 5 years ago

Hi @fanovo , Yes, you'll need to specify a GPU index. The following is one example on how to do this. https://github.com/facebookresearch/video-long-term-feature-banks/blob/master/lib/utils/metrics.py#L497

Thanks!

Hi ! I've tried to do this by run ret = model.GetParams() in resnet_video.py I can cut the gpu prefix from the ret. But when I run test_fetch = workspace.FetchBlob(gpu_prefix + 'labels'+suffix) it returns a string and says: gpu_0/batch_info_test, a C++ native class of type nullptr (uninitialized). I don't really understand what happened here. Why workspace.FetchBlob returns a string? Could you please offering me some solutions?

chaoyuaw commented 5 years ago

Hi @fanovo , create_model in resnet_video.py defines the network, but doesn't have blobs yet. Only after you forward the model (calling workspace.RunNet), the blobs (e.g., input, label, feature maps) become not empty.

workspace.FetchBlob is used to get a blob to CPU in numpy array.

Since you are modifying resnet_video.py, I'd guess what you're trying to do is modifying the model so that it does something else. If that's the case, you should not be using workspace.FetchBlob, but instead writing code with operators defined in https://caffe2.ai/docs/operators-catalogue.html

If what you're trying to do is to see the labels/input data/feature maps of a minibatch, you can use FetchBlob at, e.g., https://github.com/facebookresearch/video-long-term-feature-banks/blob/master/tools/train_net.py#L154 that is, after workspace.RunNet.

Please feel free to let me know if you have further questions. Thanks! :)

fanovo commented 5 years ago

Hi @fanovo , create_model in resnet_video.py defines the network, but doesn't have blobs yet. Only after you forward the model (calling workspace.RunNet), the blobs (e.g., input, label, feature maps) become not empty.

workspace.FetchBlob is used to get a blob to CPU in numpy array.

Since you are modifying resnet_video.py, I'd guess what you're trying to do is modifying the model so that it does something else. If that's the case, you should not be using workspace.FetchBlob, but instead writing code with operators defined in https://caffe2.ai/docs/operators-catalogue.html

If what you're trying to do is to see the labels/input data/feature maps of a minibatch, you can use FetchBlob at, e.g., https://github.com/facebookresearch/video-long-term-feature-banks/blob/master/tools/train_net.py#L154 that is, after workspace.RunNet.

Please feel free to let me know if you have further questions. Thanks! :)

Thanks for your reply! The thing I am trying to do is add some extra supportive proposals and aggregate them into person proposal vectors via nl layers before lfb-nl layers. To implement it, I add a list of boxes to every keyframe with variable size Nt5. And to put the data into the forward model, I need to form a `NNmC` matrix input into nl layers with NC person vecs, and Nm is the max number of supportive proposals I set. But since the numbers of supportive proposals are variable, for instance with a batch_size of 16 on 8GPUs, each gpu will have 2 clip of 2 keyframe. The problem is the Nts are not equal, so the boxes matrix of Nt*5 are also not in a equal shape. The way I implement it is to make two blobs to store the two keyframe boxes as 'support_boxes_0' 'support_boxes_1 with respect to the two clips in current gpu. And I make anther two blobs: 'batch_info' 'batch_len' batch_len is used to store the length of supportive boxes lits of each frame, which is just a array of size 2. batch_info is used to store the corresponding batch of predicted person proposals, which has the same length of blob proposals.

With the settings above, I am trying to pool the supportive boxes via RoIAlign to form two matrices of the two clip. So with the boxes lists of size: N1*5(clip 1), N2*5(clip 2) , I will get feature matrices of N1*C, N2*C. But to form a matrix of N*Nm*C, I have to concate zeros vecs of (1,C) to make N1*C, N2*C becomes Nm*C. And them concate them into N*Nm*C using the blob batch_info. Here are what I am trying to do, but the biggest problem now is I don't know how to get the blobs out from workspace in fordward model buiding. Does the way I designed is possible to implement? Thanks for your reply and help!

fanovo commented 5 years ago

Hi @fanovo , create_model in resnet_video.py defines the network, but doesn't have blobs yet. Only after you forward the model (calling workspace.RunNet), the blobs (e.g., input, label, feature maps) become not empty.

workspace.FetchBlob is used to get a blob to CPU in numpy array.

Since you are modifying resnet_video.py, I'd guess what you're trying to do is modifying the model so that it does something else. If that's the case, you should not be using workspace.FetchBlob, but instead writing code with operators defined in https://caffe2.ai/docs/operators-catalogue.html

If what you're trying to do is to see the labels/input data/feature maps of a minibatch, you can use FetchBlob at, e.g., https://github.com/facebookresearch/video-long-term-feature-banks/blob/master/tools/train_net.py#L154 that is, after workspace.RunNet.

Please feel free to let me know if you have further questions. Thanks! :)

Hi! I am still troubled by this problem, I don't know how to get the blob~

chaoyuaw commented 5 years ago

Hi @fanovo , Thanks for explaining what you're trying to do. I'd like to propose an alternative: Maybe you can consider creating an additional input of "mask", which contains value 1 for proposals that's non-empty (you'll have Nt of them), and 0 for empty proposals (Nm - Nt of them). Then after computing the RoIAligned features, you use the mask to make the features corresponding the empty proposals zero vectors (by multiplying the mask and the pooled features with broadcasting). The non-local operator will likely learn to ignore the zero vectors, so effectively attending only the Nt proposals.

fanovo commented 5 years ago

Hi @fanovo , Thanks for explaining what you're trying to do. I'd like to propose an alternative: Maybe you can consider creating an additional input of "mask", which contains value 1 for proposals that's non-empty (you'll have Nt of them), and 0 for empty proposals (Nm - Nt of them). Then after computing the RoIAligned features, you use the mask to make the features corresponding the empty proposals zero vectors (by multiplying the mask and the pooled features with broadcasting). The non-local operator will likely learn to ignore the zero vectors, so effectively attending only the Nt proposals.

It's really an wonderful idea! I will have a try to implement it, thanks for your help!

chaoyuaw commented 5 years ago

Great! Please feel free to let me know if you have further questions :)