Open nkemnitz opened 4 months ago
It's unexpected for me that performance matters here. I thought interpolation performance would be mostly bound by bandwidth
Just checked - downloading a 4k x 4k uint8 JPG patch is 100-150 ms. Similar to current downsampling behavior
Wow, that's a crazy fast download! But also, doesn't that mean that there's basically no inefficiency if we use pipelining? At the same time, it maybe doesn't matter and we should just put tinybrain in instead of default torch behavior. It's not a hard fix.
All our tensors are passed as NCXYZ to torch and converted to float32. That's not just a copy, but also 4x more memory.
Another thing to consider is that CloudVolume data already is in Fortran order, which tinybrain expects