datvuthanh / HybridNets

HybridNets: End-to-End Perception Network
MIT License
594 stars 121 forks source link

amp & channels_last #50

Closed xoiga123 closed 2 years ago

xoiga123 commented 2 years ago

channels_last:

While PyTorch operators expect all tensors to be in Channels First (NCHW) dimension format, PyTorch operators support 3 output memory formats. Contiguous: Tensor memory is in the same order as the tensor’s dimensions. ChannelsLast: Irrespective of the dimension order, the 2d (image) tensor is laid out as an HWC or NHWC (N: batch, H: height, W: width, C: channels) tensor in memory. The dimensions could be permuted in any order.

amp:

extra:

prefetch:

Going to train from scratch to see what's good, with a working log this time. UPDATE 12/07/2022: Seems like the bottleneck is in dataloading, which takes an unholy amount of time even though I cached everything in RAM. Currently profiling CPU & GPU and trying out this dataloader which allegedly actually does prefetch. UPDATE: It all makes sense now, Pytorch's Dataloader can only prefetch batches in the current running epoch. For the next epoch, there is apparently no prefetch whatsoever.