Closed anassmu closed 10 months ago
Hi @anassmu , Thanks for this thorough post and really appreciate you going into such details with each question. I will try to answer your questions one by one.
Are there any existing functions or tools within the Aria framework that can expedite this process, particularly for semantic segmentation tasks?
Currently there are no existing functions within the projectaria-tools to expedite semantic segmentation tasks. Please note here that some level of parallelism has already been incorporated e.g. the undistort function in projectaria-tools is already multithreaded.
Having said that, there are two things to remember here:
Is it possible to directly retrieve these semantically segmented point clouds without going through the entire workflow mentioned above?
You have to follow the workflow as above, there is no direct way at the moment. You can fasten it up by using better utilisation of compute (e.g. multiprocessing) and/or more compute.
Regarding the semi-dense point cloud and the bounding boxes provided in the dataset, is it feasible to use them for segmentation tasks? I noticed that not all classes have corresponding bounding boxes.
Currently the bounding boxes / language commands exist only for walls/doors/windows. As stated in issue #21 , we will provide more information about object poses/ bounding boxes for other classes in a future version.
Any suggestions or guidance on optimizing this process or alternative approaches that can be adopted would be greatly appreciated.
Not clear what you mean here. I have already provided some options above.
Will C++ script run faster than Python ?
You can already achieve faster computations using multiprocessing in python, you can definitely get another level of speed up with c++ using pybinds (if you want to stick with python).
Hope this helps!
Seems there is no more activity here. Closing the task for now. Please feel free to open it back if you have more questions.
I am currently working with the Aria dataset for semantic segmentation tasks. Each scene in the dataset contains around 350-1700 depth and instance images. My current workflow ( #48 #49 #1 ) involves undistorting these images, unprojecting them to 3D space, applying transformations, and creating a 3D scene with semantic information. Additionally, due to the large size of the generated point clouds, downsampling is necessary, which further adds to the processing time. On average, this workflow takes about 1-5 hours per scene. Given the size of the dataset (around 100,000 scenes), this approach is proving to be impractical.
Current Workflow
Issues Encountered
Questions and Requests for Alternatives
Thank you so much !