@psobot sorry for being slow. I will have a look this weekend. I thought about making some major changes to better spot the differences... Maybe you have some ideas on the following points?
For now I test each lib on each framework. In practice not all combinations really make sense. I wonder if its better to just everything on numpy and only use torch and tf for the libs that natively output the respective tensor formats
all tests are single processed and some libs are known to not work well with multithreading or multiprocessing. Thats why i set all workers in torch and tf to 0. For better real world performance benchmarks, it probably better to evaluate parallelization
all tests are measuring loading + decoding speed. In ML practice, we can decouple this with some libraries. So, I wonder if we should test decoding speed separately if possible.
(Note: I haven't tested this code locally yet.)