KaiyangZhou / deep-person-reid

Torchreid: Deep learning person re-identification in PyTorch.
https://kaiyangzhou.github.io/deep-person-reid/
MIT License
4.34k stars 1.15k forks source link

VideoDataManager: Multiple Sources #232

Closed avn3r-dn closed 5 years ago

avn3r-dn commented 5 years ago

Hello

I have an odd behavior with VideoDataManager. It works as expected on Duke VideoReID and Mars Video ReID datasets and I am able to train with no problems with a batch size of 40.

However, when I try to have two sources it's taking 3x+ more GPU memory for some weird reason when the VideoDataManager is supposed to have nothing to do with GPU. I even removed the classification layer from the model so that numbers of IDs doesn't increase the model.

Can you help me find the source of error?

Works: 3.5 GB GPU Memory Used

datamanager = VideoDataManager(
    root='/videos/data/reid',
    sources='dukemtmcvidreid',
    batch_size=10, 
    num_instances=5,
    seq_len=3,
    workers=10,
    sample_method='random',
    train_sampler='RandomIdentitySampler',
)

Doesn't Work: 10.5GB Memory Used

datamanager = VideoDataManager(
    root='/videos/data/reid',
    sources=['dukemtmcvidreid', 'mars'],
    batch_size=10, 
    num_instances=5,
    seq_len=3,
    workers=10,
    sample_method='random',
    train_sampler='RandomIdentitySampler',
)
datamanager = VideoDataManager(
    root='/videos/data/reid',
    sources=['dukemtmcvidreid', 'dukemtmcvidreid'],
    batch_size=10, 
    num_instances=5,
    seq_len=3,
    workers=10,
    sample_method='random',
    train_sampler='RandomIdentitySampler',
)

What could be happening that I have to reduce my batch_size from 40 to 10 to add another source?

KaiyangZhou commented 5 years ago

The error was caused by that some video dataset-specific args were not passed to the __add__ function when two or more datasets are combined.

Fixed at: https://github.com/KaiyangZhou/deep-person-reid/commit/cf553d367ffc7a5b03f8ba62ea240f18a5d31986