sceneflow data loader seems to be very slow

argoverse / av2-api

Argoverse 2: Next generation datasets for self-driving perception and forecasting.

https://argoverse.github.io/user-guide/

MIT License

307 stars 71 forks source link

sceneflow data loader seems to be very slow #192

Closed SM1991CODES closed 1 year ago

SM1991CODES commented 1 year ago

Hi,

I noticed - scene flow loader takes like 700 - 800ms to iterate over frames. Did you notice this? While I understand it is loading two frames, computing flow for each and all, but is this normal?

Best Regards Sambit

SM1991CODES commented 1 year ago

data_loader = SceneFlowDataloader(path_dataset_root, dataset_name=part_name, split_name="train")

num_frames = len(data_loader)
for index in range(0, num_frames, 10):
    past_sweep, present_sweep, pose_present_past, flow_present = data_loader[index]
    pcl_present = present_sweep.lidar.as_tensor().numpy()
    pcl_past = past_sweep.lidar.as_tensor().numpy()

    qw, qx, qy, qz = pose_present_past.r.q.w.item(), pose_present_past.r.q.x.item(), pose_present_past.r.q.y.item(), pose_present_past.r.q.z.item()
    rot_quat = Quaternion(w=qw, x=qx, y=qy, z=qz)
    rot_z_rad = rot_quat.yaw_pitch_roll[0]
    t_present = pose_present_past.t.numpy()
    past_T_present = pose_present_past.matrix().numpy()[0]

The code I am using. The stepping of 10 is intentional.

nchodosh commented 1 year ago

Hi Sambit,

The scene flow calculation needs to loop over every object in each point cloud and compute per-point masks in each iteration. So that time is also what I get when running it. If you want to speed up the data loading for training, I suggest you save the flow outputs to disk instead of re-computing them each time.

SM1991CODES commented 1 year ago

Yes, I agree on this. Isn't there an option for using multiple workers as is common with Pytorch data loaders?

Also, I would like to know - are the parts continuous? As in, is the 1st frame/ sweep of part2 the next frame of last frame/ sweep of part1 in time? I am working on a tracking based application and need this information to know if I need to reset my tracks at part boundaries.

Best Regadrs Sambit

nchodosh commented 1 year ago

The SceneFlowDataLoader is a subclass of the main Pytorch Dataset class, so you should be able to wrap it in a normal DataLoader that handles multiple workers.

The sweeps should generally come in order, but there will be discontinuities at the end of each log. You can check for that using the sweep_uuid field of the lidar sweeps.