Closed dhimmel closed 7 years ago
@nikota here is where I got to before pandas 0.20.0 was available. You can take over from here (feel free to use some of the changes here or not).
Pandas 0.20.0 will support compressed pickles but will not support compressed pickles from URL: https://github.com/pandas-dev/pandas/pull/13317#discussion_r94099768. You still may be able to use requests to download the compressed bytes and pass that to read pickle. The ideal situation is that users can quickly download a smallish file (< 100 MB) and read it into pandas in under ~10 seconds.
@dhimmel can you post the compressed pickle files and share the URL so I can take a crack at this?
@dhimmel are you ok if I close this PR for now? (For housekeeping purposes)
are you ok if I close this PR for now?
Yes! Thanks for helping with the maintenance.
This pull request adds a step to the data download process which creates pickled versions of TSVs. These pickles are much faster than the TSVs to read into pandas. Reduces file reading time from minutes to seconds.
Unfortunately the pickle files are big (~1.2 GB) and pandas does not support pickle compression yet. Otherwise, I would have uploaded the compressed pickles using git fls. We still could, but I think it's easier to locally generate these pickles than download 2 extra gigabytes.