Lyken17 / Efficient-PyTorch

My best practice of training large dataset using PyTorch.
1.08k stars 139 forks source link

msgpack.exceptions.ExtraData: unpack(b) received extra data #3

Closed Fangyh09 closed 4 years ago

Fangyh09 commented 5 years ago
self.keys = msgpack.loads(txn.get(b'__keys__'))

msgpack==0.5.6

Lyken17 commented 5 years ago

Sorry, can you explain more detailly about what happened?

Fangyh09 commented 5 years ago

Please refer this https://github.com/Fangyh09/Image2LMDB. I solved other small problems.

--- old reply --- I fixed it by using

def loads_pyarrow(buf):
    """
    Args:
        buf: the output of `dumps`.
    """
    return pa.deserialize(buf)

Thanks for the code! It's awesome.

Lyken17 commented 5 years ago

Glad to see you resolve the issue yourself : )

Packing separate images into single LMDB helps when disk I/O is the bottleneck. If one day you find CPU utilization becomes bottleneck, then you should take a look of https://github.com/NVIDIA/DALI

Fangyh09 commented 5 years ago

Thanks a lot 👍

syt2 commented 5 years ago

I fixed it by using

def loads_pyarrow(buf):
    """
    Args:
        buf: the output of `dumps`.
    """
    return pa.deserialize(buf)

Thanks for the code! It's awesome.

Hi, it seems i got the same problem too. could u tell me how to slove this question i dont know where to use function loads_pyarrow() thinks

Fangyh09 commented 5 years ago

@dreamcontinue Hi, you can try this https://github.com/Fangyh09/Image2LMDB.

syt2 commented 5 years ago

@dreamcontinue Hi, you can try this https://github.com/Fangyh09/Image2LMDB.

think you for your reply I sloved it and found that IO speed is still quite slow TAT

Lyken17 commented 5 years ago

In the ideal case, LMDB should provide much faster I/O compared with original JPEG files. Can you show more detailed information?

Fangyh09 commented 4 years ago

@dreamcontinue Yes, I also find that.

I have solved other problems, please refer to this https://github.com/Fangyh09/Image2LMDB instead of

def loads_pyarrow(buf):
    """
    Args:
        buf: the output of `dumps`.
    """
    return pa.deserialize(buf)