auspicious3000 / autovc

AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
https://arxiv.org/abs/1905.05879
MIT License
996 stars 207 forks source link

Cmpatible loss function #16

Closed bva1986 closed 5 years ago

bva1986 commented 5 years ago

Hello

To reproduce results of AutoVC we used following loss function implementation on PyTorch

num_steps = 100000
G = Generator(32,256,512,32).to(device)
...
criterion_recon = nn.MSELoss().to(device)
criterion_recon0 = nn.MSELoss().to(device)
criterion_content = nn.L1Loss().to(device)
optimizer = optim.Adam(G.parameters(), lr=1e-4)
G.train()
...
for i in range(1, num_steps + 1):
    ...
    optimizer.zero_grad()
    X1rt, X1r, C1 = G(X1, S1, S1)
    #with torch.no_grad(): # Should we calculate C1r with grad?
    #    C1r = G(X1r[:, 0, :, :], S1, None)
    C1r = G(X1r[:, 0, :, :], S1, None)
    L_recon = criterion_recon(X1r[:, 0, :, :], X1)
    L_recon0 = criterion_recon0(X1rt[:, 0, :, :], X1)
    L_content = criterion_content(C1r, C1)
    loss = L_recon + 1.0*L_recon0 + 1.0*L_content
    loss.backward()
    optimizer.step()

But we did not achieved comparable voice quality and loss value around 1e-3 according issue whats your final loss and final learning rate? One of supposed reason is that we used inapropriate loss function implementation. Is our loss function implementation compatible with that you used?

BR

auspicious3000 commented 5 years ago

yes, your loss function looks right

bva1986 commented 5 years ago

Should we backpropogate Encoder->Decoder->Encoder or just Encoder when minimize loss C1r-C1 ?

C1r = G(X1r[:, 0, :, :], S1, None)
L_content = criterion_content(C1r, C1)

or

C1r = G(X1r[:, 0, :, :].detach(), S1, None)
L_content = criterion_content(C1r, C1)

or even

C1r = G(X1r[:, 0, :, :].detach(), S1, None)
L_content = criterion_content(C1r, C1.detach())
auspicious3000 commented 5 years ago

The 1st one

bva1986 commented 5 years ago

According to AutoVC presentation on ICML-2019 loss function is: L = Lrecon + lambda Lcontent. But according to paper https://arxiv.org/abs/1905.05879 loss function is L = Lrecon + mu Lrecon0 + lambda*Lcontent. Which of them is more preferable?

auspicious3000 commented 5 years ago

The 2nd one

bva1986 commented 5 years ago

Ok, thanks