Closed Tobias-Fischer closed 2 years ago
Dear @Tobias-Fischer, Thanks for paying attention to our work again!
The sample codes provided assume you've already preprocessed the event stream into subsequent event frames,
Which is produced by simply accumulating events by the threshold based on the temporal window, and later filtered by event numbers. The procedure is as follows: (1) find the number of events in the fixed time interval (5ms) (2) if the number of events is too small (<5% of total pixels), skip the frame (the car is probably stopped). (3) if the number of events is too large (>10% of total pixels), reduce the temporal window to contain events of 10% pixels. (4) with the adjusted window size, construct an event frame. (5) repeat until the event stream becomes empty.
You need to feed 3 subsequent event frames into the event denoiser we provided, as the way we created a denoised image in the code create_denoised_samples.py. The remaining procedure is identical to https://github.com/Nanne/pytorch-NetVlad. You could feed the denoised image into the Imagenet_vgg encoder with pretrained weights we provided.
We could have provided the code for this preprocessing, but (a) The code is old and is actually quite a bit unordered, and (b) not appropriate for the event stream because it precomputes the whole event stream into images.
Thanks to your attention, I've discovered that I've missed out on some preprocessing stuff, so thanks again!
I hope this helped! Best, Alex
Hi there,
Many thanks for this!
I have written some basic code (super slow and non-optimised, just for testing) following your description (see attached) - could you please have a look at whether that looks okay? build_event_frames_event_vlad.py
If I feed in three subsequent event frames to your create_denoised_samples.py
, I get the following result - does that look how you would expect it to look?
Do I understand that I then feed the denoised image (middle column) to your NetVLAD layer? Is the mask used for anything?
Many thanks, Tobi
Hi Tobias,
Thanks for keeping attention to our result! I've checked your run result, and I guess the network is not running properly. The masked denoised image in the right is the input to the NetVLAD layer, but it is apparently failing in the test case.
I assume the error comes from the different noise characteristics of each event sensor, especially because our model was trained only with the event simulator in this work. In successful cases, the output should look like the below.
These are the lists you may try, to use our module for denoising.
Thanks! Alex
Hi Alex,
Many thanks for your quick response! May I ask which settings you used for the Brisbane-Event-VPR dataset (346x260 resolution)? This would help a lot. Also, if you have the denoised images (or NetVLAD features) for the Brisbane-Event-VPR dataset somewhere and could share them, this would be highly appreciated! I am trying to replicate Fig 6 of your paper.
Thanks, Tobi
Hi Tobi,
I could not find the old files and weights used for evaluation, but I've found the settings for them. The size of the temporal window was 66ms, with no upper limit of events, and pass the frame if there were less than 1% events.
But fortunately, I still have the bagfile of the Brisbane dataset, and I will try regenerating event image files and provide the fine-tuned denoiser on the Brisbane dataset and share the results asap.
Thanks! Alex
Many thanks for that, this would be highly appreciated!
Hi Tobi,
I've updated the additional weight file for the Brisbane dataset. It is based on the Carla weights but fine-tuned with the Morning sequence. You can download it with this link! The masking layer is only trainable in the simulation, so it was not trained. I guess you can just skip the masking part!
The output using the setting above (66ms / no upper limit / 1% reject), you will see the results like in below.
Sample in Daytime
Sample in Midnight
Thanks! Alex
Hi Alex,
Many thanks for fine-tuning the weights and providing them! Unfortunately, I am still having trouble obtaining the same results that you do.
I am assuming it has got something to do with the removal of hot pixels, or some other technical detail.
For these three subsequent images:
I am obtaining the following result:
Would it be possible to share how you extract the event frames from the bag files, or even the denoised images? I can provide you with storage space if that is an issue.
Many thanks, Tobi
Here is, for reference, the output of the same place that you used in your daytime sample. Both noisy image and denoised image look very different, but I can't figure out why :(.
Best, Tobi
Hi Tobi,
This is t = 114.730462s of daytime sequence. Maybe it's just visualization, but the event Images you've provided look like they do not contain much of the events. The event image should look like the above, with the provided settings. Therefore I've also uploaded the preprocessing code (which is a little bit hacky Matlab)
I've also tested the event image in the sample above, with the script create_denoised_samples.py
.
Did this help? All the reconstructed edges are without applying masks, to make sure.
Best, Alex
Thanks Alex - that has helped! I appreciate that.
I was able to run the create_denoised_sample.py
script and save the edge images. I am now trying to run the NetVLAD feature extraction part. However, the edge images are single channel (i.e. grayscale):
https://github.com/alexjunholee/EventVLAD/blob/a92cc827b7b0b98a8c43567c775919f795937933/create_denoised_samples.py#L140
while your EventVLAD.py script expects 3-channel colour images:
https://github.com/alexjunholee/EventVLAD/blob/a92cc827b7b0b98a8c43567c775919f795937933/networks/EventVLAD.py#L10
This naturally leads to:
RuntimeError: output with shape [1, 256, 256] doesn't match the broadcast shape [3, 256, 256]
Could you please let me know which transform I need to use to pass the edge images to your EventVLAD layer?
On another note, the edge images also seem fairly dark overall; is that correct? Example below:
I also notice that the imageSize is set to 224x224 while the denoising code outputs sizes 256x256
Hi Tobi,
Also, in my case, the edge images were dark overall, as the images are normalized in the samples above. The performance was not seemingly affected by this dark-ish behavior.
And sorry for missing out on that part; you can duplicate the original image into three channels. This comes from the original VGG structure, receiving three channels as input.
Also, the image should be resized to 224x224. I've used the following line for that.
img = cv2.resize(img, dsize=(224,224), interpolation = cv2.INTER_LINEAR)
Thanks again for your work to reproduce our result. If there are any queries afterward, please feel free to ask us.
Best, Alex
Thanks Alex for your response! Could you also please share how you normalize/transform the images? Do you use cv2 to read the image? Then do you subtract mean (if so, which one) and divide by std deviation (if so, which one)?
It would be great if you could provide the full code from reading the images to the forward pass - getting some details wrong could result in wrong behaviour which would be hard to spot.
Hi Tobi, the images were not normalized in our experiment. I've found that I've implemented the normalization layer as mean = 0.500, std = 0.250, but haven't applied it.
A data loader returns an image by following,
def __getitem__(self, index):
img = cv2.imread(self.qImages[index], 1)
img = img.squeeze()
img = cv2.resize(img, dsize=(224, 224), interpolation=cv2.INTER_LINEAR)
img = torch.from_numpy(img.astype('float')/255.)
img = img.permute(2, 0, 1)
return img
Then, a tensor is passed to the network, identical to the NetVLAD pipeline.
for iteration, data in enumerate(data_loader['test']):
input = data.to(device).float()
image_encoding = model.module.encoder(input)
vlad_encoding = model.module.pool(image_encoding[:,:,np.newaxis,np.newaxis])
qFeat[iteration,] = vlad_encoding.detach().squeeze().cpu().numpy()
the other parts (faiss, prediction matrix etc) are as follows, which will be already familiar to you.
faiss_index = faiss.IndexFlatL2(pool_size)
faiss_index.add(dbFeat)
recall_n = [1,5,10]
faiss.cvar.distance_compute_blas_threshold = 100000
_, predictions = faiss_index.search(qFeat, max(recall_n))
distmat, fullpred = faiss_index.search(qFeat, len(fullset))
Thanks again. For your EventVLAD class, there is no encoder/pool added via add_module. So the code above unfortunately does not work. It would be great if you could clarify, or provide a full code snippet that can be run.
Best, Tobi
Hi Tobi,
Thanks for your kind explanation and sorry I've missed that part. You can load and initialize the network by the following code:
encoder = Imagenet_vgg(opt.pretrained)
model = nn.Module()
model.add_module('encoder', encoder)
net_vlad = NetVLAD(num_clusters=opt.num_clusters, dim=encoder_dim, alpha = 1.0)
initcache = "your/path/to/centroids.hdf5"
with h5py.File(initcache, mode='r') as h5:
clsts = h5.get("centroids")[...]
traindescs = h5.get("descriptors")[...]
net_vlad._init_params(clsts, traindescs)
del clsts, traindescs
model.add_module('pool', net_vlad)
if torch.cuda.device_count() > 1:
model.encoder = nn.DataParallel(model.encoder)
model.pool = nn.DataParallel(model.pool)
checkpoint = torch.load("your/path/to/checkpoints.pth.tar", map_location=lambda storage, loc:storage)
model.load_state_dict(checkpoint['state_dict'], strict=False)
model = model.to(device)
Best, Alex
Thanks! Just to confirm, the Imagenet_vgg is from your EventVLAD class? What is opt.pretrained? And do I need the initcache part?
Hi Tobi,
Yes, the Imagenet_vgg is imported from EventVLAD.py, and as we load the weight from load_state_dict, you can just set it to None. And the centroids are created by get_clusters
function as in here, as our implementation is based on the repo (pytorch-NetVlad).
Thanks!
Best, Alex
Thanks - could you please provide the centroids as well? They are not provided in the other repo.
Hi Tobi,
I have the Carla-trained centroids. But I guess It wouldn't help, as the distributions of centroids are dependent on datasets and trained weights. So it would be best to create with the weights and their train set, with the new fine-tuned VGG16 weights.
Best, Alex
Ok - I guess for just inference I don't need the centroids though, do I?
For inference, you'll need to transform the database images (match reference) into clusters. That'll be used to extract residuals for query VLAD vectors.
Hi Alex,
Unfortunately this still does not work. I get the following error:
File "/Users/fischert/mambaforge/envs/salient-event-vpr/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Module:
Unexpected key(s) in state_dict: "pool.lastfc.weight".
Indeed there is no reference to lastfc
in https://github.com/Nanne/pytorch-NetVlad. Could you please provide the full code that you used for the NetVLAD pooling layer?
Hi Alex,
I guess another issue is that the fully connected layers need to be removed from the backbone - something similar to what is done in: https://github.com/Nanne/pytorch-NetVlad/blob/8f7c37ba7a79a499dd0430ce3d3d5df40ea80581/main.py#L392-L393
I would really appreciate if you could provide a full working example - a very brief code snippet that loads a handful of images and extracts the features for these images. I don't mind if the code is messy etc. If you do not want to share it here publicly, please drop me an email to tobias.fischer@qut.edu.au
I also just realised that you probably made some changes in https://github.com/Nanne/pytorch-NetVlad/blob/8f7c37ba7a79a499dd0430ce3d3d5df40ea80581/netvlad.py#L8-L12
As their __init__
does not accept an alpha
, but in your code snippet above you pass an alpha
. It would be great if you could provide the modified NetVLAD class with a complete working example :)
Hi Tobi,
The alpha parameter is not used in the network, but I forgot to remove it from the initialization. And as you've observed the lastfc layer is added to the model. Sorry for missing out on these and causing an interruption.
Thanks for your continuous effort to reproduce the results, I will contact you by email with this issue from here. Thanks!
Thanks Alex, I'm looking forward to your email!
Your email got blocked from my uni account - could you please resend to info@tobiasfischer.info ?
Sure!
Note that https://github.com/alexjunholee/EventVLAD/blob/main/networks/netvlad.py has now been updated, and that the other trick is to vlad = model.pool(image_encoding[:,:,np.newaxis,np.newaxis])
as opposed to vlad = model.pool(image_encoding)
.
Dear @alexjunholee, many thanks for quickly resolving #1 and #2, that's awesome!
I am still confused what your VPR pipeline looks like. How do I obtain the NetVLAD features given an event stream? It seems like https://github.com/alexjunholee/EventVLAD/blob/main/create_denoised_samples.py takes intensity images as input, but how do I obtain those given an event stream?
My aim is to have the typical VPR pipeline where I extract the features for the reference traverse, and then compare the query features of a query event stream with the reference features to find the closest match.
Many thanks again, Tobi