facebookresearch / projectaria_tools

projectaria_tools is an C++/Python open-source toolkit to interact with Project Aria data
https://facebookresearch.github.io/projectaria_tools/docs/intro
Apache License 2.0
490 stars 65 forks source link

ASE : Insights About Long Processing Time for Semantic Segmentation of Scenes #51

Closed anassmu closed 10 months ago

anassmu commented 11 months ago

I am currently working with the Aria dataset for semantic segmentation tasks. Each scene in the dataset contains around 350-1700 depth and instance images. My current workflow ( #48 #49 #1 ) involves undistorting these images, unprojecting them to 3D space, applying transformations, and creating a 3D scene with semantic information. Additionally, due to the large size of the generated point clouds, downsampling is necessary, which further adds to the processing time. On average, this workflow takes about 1-5 hours per scene. Given the size of the dataset (around 100,000 scenes), this approach is proving to be impractical.

Current Workflow

Issues Encountered

Questions and Requests for Alternatives

Thank you so much !

suvampatra commented 10 months ago

Hi @anassmu , Thanks for this thorough post and really appreciate you going into such details with each question. I will try to answer your questions one by one.

Are there any existing functions or tools within the Aria framework that can expedite this process, particularly for semantic segmentation tasks?

Currently there are no existing functions within the projectaria-tools to expedite semantic segmentation tasks. Please note here that some level of parallelism has already been incorporated e.g. the undistort function in projectaria-tools is already multithreaded.

Having said that, there are two things to remember here:

  1. This is one of the largest indoor datasets provided (100K), with an option to use a fragment of it according to a user’s need. E.g. you can easily use 1K, 10K, 20K etc, according to your need and available compute. If you wanted to use the whole 100K, you will need an equivalent amount of compute. You can fasten up the process of semantic segmentation using techniques like multiprocessing to make the maximal use of your available compute.
  2. The dataset was provided with the intent for indoor reconstruction in mind and hence point clouds, poses, images were provided with additional cues like depth/instances. On further requests we extended the dataset to have instances associated with classes. So the onus currently lies upon the user to generate derivatives of this which includes Semantic segmentation data. Having more data like segmented points from us will also mean, that each of the dataset chunks get heavier, which gets infeasible in terms of storage as well as download sizes

Is it possible to directly retrieve these semantically segmented point clouds without going through the entire workflow mentioned above?

You have to follow the workflow as above, there is no direct way at the moment. You can fasten it up by using better utilisation of compute (e.g. multiprocessing) and/or more compute.

Regarding the semi-dense point cloud and the bounding boxes provided in the dataset, is it feasible to use them for segmentation tasks? I noticed that not all classes have corresponding bounding boxes.

Currently the bounding boxes / language commands exist only for walls/doors/windows. As stated in issue #21 , we will provide more information about object poses/ bounding boxes for other classes in a future version.

Any suggestions or guidance on optimizing this process or alternative approaches that can be adopted would be greatly appreciated.

Not clear what you mean here. I have already provided some options above.

Will C++ script run faster than Python ?

You can already achieve faster computations using multiprocessing in python, you can definitely get another level of speed up with c++ using pybinds (if you want to stick with python).

Hope this helps!

suvampatra commented 10 months ago

Seems there is no more activity here. Closing the task for now. Please feel free to open it back if you have more questions.