domain transfer with single-node training strategy

To transfer domain a separate decoder is required per domain, as it is stated in the paper:

In a distributed setup discriminator and encoder are wrapped with DistributedDataParallel, where as decoder has only DataParallel: https://github.com/facebookresearch/music-translation/blob/fd51cbcbeb0af3de0930e79c25b539d050cc9e11/src/train.py#L168 Which results in across nodes gradient descent for discriminator and encoder, yet decoder is not being shared and effectively a separate domain is being trained per node. DataParallel is being used to accelerate batch processing with multiple GPUs per node.

A thing is that single node training does still rotate domain where from batches are sampled. https://github.com/facebookresearch/music-translation/blob/fd51cbcbeb0af3de0930e79c25b539d050cc9e11/src/train.py#L273 In such a case single node training does learn a hidden representation which is domain agnostic, yet synthesis will produce the very same input waveform.

What way to transfer the domain in such a case? There's no parameterized domain class inside latent representation and a single decoder is being trained.

Is there a hidden purpose for a single node training? Since it's not clear what way to make music translation with that.

Regarding pretrained weights, encoder parameters are different across nodes checkpoints. For some reason they were not synchronized during the training:

[nav] In [48]: a = torch.load('tmp/cache/pretrained/pretrained_musicnet/bestmodel_0.pth', map_location='cpu')                 

[nav] In [49]: a2 = torch.load('tmp/cache/pretrained/pretrained_musicnet/bestmodel_1.pth', map_location='cpu')                

[nav] In [50]: t1 = list(a2['encoder_state'].keys())[-1]       

[nav] In [51]: a['encoder_state'][t1]                          
Out[51]: 
tensor([ 0.0064, -0.0318, -0.0017, -0.0342,  0.0067, -0.0512, -0.0476, -0.0463,
         0.0446,  0.0633,  0.0201, -0.0987,  0.0165,  0.0286, -0.0317, -0.0012,
        -0.0301,  0.0235,  0.0066,  0.0113,  0.0395,  0.0243, -0.0185,  0.0024,
        -0.0730, -0.0382,  0.0009,  0.0688, -0.0008, -0.0129,  0.0274, -0.0230,
         0.0429,  0.0178,  0.0159, -0.0394, -0.0668, -0.0432, -0.0072, -0.0950,
         0.0024, -0.0570,  0.0007,  0.0400, -0.0158, -0.0215,  0.0286,  0.0193,
        -0.0271,  0.0465, -0.0064, -0.0040, -0.0050, -0.0652, -0.0030,  0.0325,
         0.0042,  0.0485,  0.0675,  0.0232, -0.0263,  0.0258,  0.0349,  0.0141])

[nav] In [52]: a2['encoder_state'][t1]                         
Out[52]: 
tensor([-0.0154,  0.0399, -0.0364, -0.0754,  0.0035,  0.0237, -0.0531, -0.0549,
        -0.0071,  0.0692, -0.0241, -0.2067,  0.0322,  0.0062, -0.1778, -0.0508,
        -0.0051, -0.0037,  0.0062,  0.0282,  0.0213,  0.0251,  0.0303,  0.0028,
        -0.0910, -0.0762,  0.0249,  0.1074, -0.0081, -0.0107, -0.0129,  0.0226,
         0.0014,  0.0112, -0.0069,  0.0070, -0.0495,  0.0344,  0.0101, -0.1099,
         0.0137, -0.0044,  0.0015,  0.0854,  0.0237,  0.0122,  0.0225, -0.0230,
        -0.0160, -0.0560,  0.0040, -0.0189,  0.0475, -0.0035,  0.0066,  0.0311,
         0.1182,  0.0635, -0.0191, -0.0032, -0.0132,  0.0267, -0.0393, -0.0044])

[ins] In [53]: t1                                              
Out[53]: 'conv_1x1.bias'

Outputs generated by loaded encoders from lastmodel_0.pth, lastmodel_1.pth, and lastmodel_2.pth for a common input numpy.random.randint(0, 256, (1, 800)) are different.

facebookresearch / music-translation

domain transfer with single-node training strategy #7