Dataset release request

bobhagen commented 3 months ago

Hello @Somayeh-h,

I hope this message finds you well. I am reaching out because I am having a hard time generating the exact same training and test sets you used in your papers. For instance, the Nordland data pre-processing procedure described in your 2023 ICRA paper does not yield 3300 places. Could you please share the data processing scripts? Alternatively, sharing the training and test data along with the place labels would be immensely helpful. Even a table of input (following the original dataset naming convention) and place label pairs would be beneficial.

Additionally, I noticed a discrepancy in the P@100%R (weighted probability-based, over 40%) value for the ORC data reported in Figure 4 of your 2022 RAL paper compared to what is listed (4% precision at 100% Recall) in Table 1 of your 2023 ICRA paper. Unless I am missing something, it appears that the increased number of places in the dataset used for your 2023 ICRA paper might play a significant role in this discrepancy. I would greatly appreciate any further insights you could provide on this matter.

Thank you in advance for your time and assistance. I look forward to your response.

Best, Robert

Somayeh-h commented 3 months ago

Hi @bobhagen,

Thank you for reaching out. We have provided the details of our pre-processing steps, which I explain here, in modular_snn/README.md.

The exact configurations that we used for training, calibrating, testing, and evaluating the performance of our method is provided in files modular_snn/modular_snn_processing.py and modular_snn/modular_snn_evaluation.py.

The entire data pre-processing steps of our approach is in the function processImageDataset available in tools/data_utils.py script.

To ensure generating the exact same training and test sets, please check the following items:

Dataset: The variation of Nordland dataset we used in our work is comprised of four folders, spring, summer, fall and winter, each containing 35768 images. We have provided this variation of the Nordland dataset in this link.

Dataset_imageNames: Please ensure you load the Nordland dataset imageNames file, which filters the images down to 27592 images, removing sections where the speed of the train is less than 15 km/h. The dataset_imageNames files are stored in dataset_imagenames folder. You can verify the number of filtered images in this line when loading the dataset.

Sampling method: We sample the images in each traverse approximately every 100 m for the Nordland dataset, by setting the variable skip=8, which is provided in the configurations mentioned above.

By using the same dataset, applying the same configurations and following the same processes, the data variable in here, should have 2700 images for a traverse at test time, 600 images for a traverse at calibration time, and 25 images for each traverse at training time. The total number of images used for testing and calibration using the query dataset for Nordland is 3300. The labels of the images are also generated in the function processImageDataset, where the labels correspond to the image indices after the sampling is applied.

As noted in modular_snn/README.md, we have released the trained weights of our Modular SNN on the Nordland dataset using reference traverses spring and fall in here, which you can use for testing the Modular SNN using a query dataset (summer or winter traverse).

Regarding the second matter, your observation is correct. In our RAL 2022 paper, we introduced weighted neuronal assignments to address ambiguity in neuronal responses, focusing on small-scale environments with only 100 places. This achieved a P@100%R of approximately 40% for ORC.

Our ICRA 2023 paper demonstrated that our RAL 2022 network's performance is limited on large datasets due to computational constraints. We expanded the network to match the number of places to learn and trained it for 26 epochs. Table 1 values are from standard assignments rather than weighted neuronal assignments. In contrast, our modular SNN modules can be trained in parallel, enabling scalability to larger place numbers. In ICRA 2023, we highlight how modularity enhances learning capability for larger datasets.

Hope you find these information useful. Please reach out if the issue persists.

Thanks, Somayeh

bobhagen commented 2 months ago

Hello @Somayeh-h,

Thank you so much for your detailed response. Over the past three weeks, I have dedicated my time to going through your code and multiple other papers from your group. However, I still find myself in need of your assistance regarding the datasets mentioned in your 2023 arXiv paper.

Synthia data: I downloaded the data shared by Mubariz from this link. There were 4 folders ref, ref_new, query, and query_new. Unfortunately, none of these folders contain the same number of images as those listed here. Are the image file names you listed consistent with the ones in the zip file shared by Mubariz? If so, which pair of folders in Mubariz's data should I utilize? If not, could you kindly provide a link to the data you used?
SPEDTest, St. Lucia datasets: Could you please confirm if you used the datasets shared by Mubariz?
SFU-Mountain dataset: Could you please provide a link to the data you used?
Oxford RobotCar dataset: The image file names you share here do not seem to align with the original Oxford RobotCar dataset's naming convention. Could you please either share the data you used or the list of images with their original Oxford RobotCar naming convention? Furthermore, I must mention that the preprocessing descriptions in this paper, as well as in several others from your group, do not appear sufficient for replicating the exact data you used. It's unclear how one could ensure spatial alignment between different traverses with only the information provided about sampling approximately every 10 meters.

Ideally, sharing all the data in the same way Mubariz did in his GitHub repo (the raw images together with the labels) would address my inquiry.

Exploring your research has been both enlightening and enjoyable, with your papers undeniably attracting increasing attention each day. Your leadership is crucial for the community, especially in addressing the points mentioned above.

Cheers, Robert

QVPR / VPRSNN

Dataset release request #10