Lyken17 / Efficient-PyTorch

My best practice of training large dataset using PyTorch.
1.09k stars 138 forks source link

About the msgpack #7

Closed Solacex closed 5 years ago

Solacex commented 5 years ago

I noticed that you are using pyarrow for serialization and msgpack for deserialization, Do msgpack have faster speed in deserialization? I have some problem on the msgpack when using: image the error is: image while I using pa.serialize wirh no errors? Have you met this problem? or do you have any advices? I think it may caused by the version of msgpack, so could you provide with the version of msgpack?

Lyken17 commented 5 years ago

I am using the latest version of msgpack, not sure why you meet the problem.

The settings are adapted from Tensorpack. In fact, LMDB does not specify the tools for serialization and deserialization. You are free to choose whatever you want (as long as they are compatible).

Solacex commented 5 years ago

What's your python version? 2 or 3?

Lyken17 commented 5 years ago

3.6

Solacex commented 5 years ago

Hello, based on a small example, msgpack indeed has a much faster deserialization speed: image