Speed up readers - Githubissues

Quasars / orange-spectroscopy

Other

52 stars 58 forks source link

In [3]: %time np.loadtxt('Hermes_y_136.csv', delimiter=',') CPU times: user 20.7 s, sys: 1.13 s, total: 21.8 s Wall time: 22.3 s In [4]: %time pd.read_csv('Hermes_y_136.csv', delimiter=',') CPU times: user 4.49 s, sys: 343 ms, total: 4.83 s Wall time: 4.91 s

I discovered that np.loadtxt has an advantage at small file sizes and pd.read_csv is faster for large ones. In other words, the loading time for NumPy is linear with the file size while it is not with Pandas.

The crossover is around 1MB, so this brings an interesting question as individual files are usually below the MB limit. If we want to speed up large files we definitely should switch, but this would set us back when loading series of small files with Multifile. I still need to test what this would mean for us.

The figure below is done in pure python, not through the Quasar loaders. Then we get some overhead, which would be also interesting to investigate and decrease.

Quasars / orange-spectroscopy

Speed up readers #532