auspicious3000 / SpeechSplit

Unsupervised Speech Decomposition Via Triple Information Bottleneck
http://arxiv.org/abs/2004.11284
MIT License
645 stars 92 forks source link

F0 Converter for P - loss function values #36

Open rishabhjain16 opened 3 years ago

rishabhjain16 commented 3 years ago

I am trying to replicate your work. I am currently making F0 converter model for P checkpoint generation. I am stuck at loss calculation.

I see when I use F0_Converter model to generate P, I get a 257 dimension one-hot encoded feature P.

Demo.ipynb

f0_pred = P(uttr_org_pad, f0_trg_onehot)[0]
f0_pred.shape
> torch.Size([192, 257])

I wanted to ask you when training the F0 converter model, what is the value that you are using to calculate the loss?

I tried using the following value but I am not sure if that is the right way. This is what I am doing to generate f0_pred and to calculate the loss:

f0_pred = self.P(x_real_org,f0_org_intrp)[0]
p_loss_id = F.mse_loss(f0_pred,f0_org_intrp,reduction='mean')

I just want to know if I am on the right track. Can you help me out here @auspicious3000

auspicious3000 commented 3 years ago

The output of the f0 predictor is 257 dim logit instead of one-hot. So, you need to use cross-entropy loss as indicated in the paper.

rishabhjain16 commented 3 years ago

Thank you for your quick response.I understand what you are saying. I found that in the appendix of paper. What I meant to ask are the 2 values you are using to calculate the loss. How are you getting the value of f0_orig in 257 dim to feed into the loss function.

Loss function requires 2 values. One is f0_pred which is the output of F0_converter model. What is the other value?

What I am asking is the input for the cross entropy loss?

auspicious3000 commented 3 years ago

The target is the quantized the ground truth f0, based on https://arxiv.org/abs/2004.07370

rishabhjain16 commented 3 years ago

Thanks for your help. Paper covered most of my doubts. Great read.

Merlin-721 commented 3 years ago

In the 'Train the generator' section of solver.py:

        self.G = self.G.train()
        self.P = self.P.train()

        # G Identity mapping loss
        x_f0 = torch.cat((x_real_org, f0_org), dim=-1)
        x_f0_intrp = self.Interp(x_f0, len_org) 

        f0_org_intrp = quantize_f0_torch(x_f0_intrp[:,:,-1])[0]
        x_f0_intrp_org = torch.cat((x_f0_intrp[:,:,:-1], f0_org_intrp), dim=-1)

        # G forward
        x_pred = self.G(x_f0_intrp_org, x_real_org, emb_org)
        g_loss_id = F.mse_loss(x_real_org, x_pred, reduction='mean') 

        # Preprocess f0_trg for P 
        x_f0_trg = torch.cat((x_real_trg, f0_trg), dim=-1)
        x_f0_intrp_trg = self.Interp(x_f0_trg, len_trg) 

        # Target for P
        f0_trg_intrp = quantize_f0_torch(x_f0_intrp_trg[:,:,-1])[0]

        # P forward
        f0_pred = self.P(x_real_org,f0_trg_intrp)
        f0_trg_intrp_indx = f0_trg_intrp.transpose(1,2).argmax(2)
        p_loss_id = F.cross_entropy(f0_pred,f0_trg_intrp_indx, reduction='mean')

        # Backward and optimize.
        g_loss = g_loss_id
        p_loss = p_loss_id
        self.reset_grad()
        g_loss.backward()
        p_loss.backward()
        self.g_optimizer.step()
        self.p_optimizer.step()

        # Logging.
        loss = {}
        loss['G/loss_id'] = g_loss_id.item()
        loss['P/loss_id'] = p_loss_id.item()

This appears to be working for me (ie seems to run at least!)

3139725181 commented 3 years ago

In the 'Train the generator' section of solver.py:

        self.G = self.G.train()
        self.P = self.P.train()

        # G Identity mapping loss
        x_f0 = torch.cat((x_real_org, f0_org), dim=-1)
        x_f0_intrp = self.Interp(x_f0, len_org) 

        f0_org_intrp = quantize_f0_torch(x_f0_intrp[:,:,-1])[0]
        x_f0_intrp_org = torch.cat((x_f0_intrp[:,:,:-1], f0_org_intrp), dim=-1)

        # G forward
        x_pred = self.G(x_f0_intrp_org, x_real_org, emb_org)
        g_loss_id = F.mse_loss(x_real_org, x_pred, reduction='mean') 

        # Preprocess f0_trg for P 
        x_f0_trg = torch.cat((x_real_trg, f0_trg), dim=-1)
        x_f0_intrp_trg = self.Interp(x_f0_trg, len_trg) 

        # Target for P
        f0_trg_intrp = quantize_f0_torch(x_f0_intrp_trg[:,:,-1])[0]

        # P forward
        f0_pred = self.P(x_real_org,f0_trg_intrp)
        f0_trg_intrp_indx = f0_trg_intrp.transpose(1,2).argmax(2)
        p_loss_id = F.cross_entropy(f0_pred,f0_trg_intrp_indx, reduction='mean')

        # Backward and optimize.
        g_loss = g_loss_id
        p_loss = p_loss_id
        self.reset_grad()
        g_loss.backward()
        p_loss.backward()
        self.g_optimizer.step()
        self.p_optimizer.step()

        # Logging.
        loss = {}
        loss['G/loss_id'] = g_loss_id.item()
        loss['P/loss_id'] = p_loss_id.item()

This appears to be working for me (ie seems to run at least!)

Hello, I want to now where the x_real_trg come from...

Merlin-721 commented 3 years ago

I've changed some of the code around since, but hopefully this helps a bit. Both 'org' and 'trg' are just different instances. I had just tried applying some of the code from elsewhere in the repo to training so used these naming conventions. You can see here I've used the same instances to train both models:


            x_real_org, emb_org, f0_org, len_org = next(data_iter)
            # applies .to(self.device) to each:
            x_real_org, emb_org, len_org, f0_org = self.data_to_device([x_real_org, emb_org, len_org, f0_org])

            # combines spect and f0s
            x_f0 = torch.cat((x_real_org, f0_org), dim=-1)
            # Random resampling with linear interpolation
            x_f0_intrp = self.Interp(x_f0, len_org) 
            # strips f0 from trimmed to quantize it
            f0_org_intrp = quantize_f0_torch(x_f0_intrp[:,:,-1])[0]

            self.G = self.G.train()
            # combines quantized f0 back with spect
            x_f0_intrp_org = torch.cat((x_f0_intrp[:,:,:-1], f0_org_intrp), dim=-1)

            # G forward
            x_pred = self.G(x_f0_intrp_org, x_real_org, emb_org)
            g_loss_id = F.mse_loss(x_pred, x_real_org, reduction='mean') 

            # Backward and optimize.
            self.g_optimizer.zero_grad()
            g_loss_id.backward()
            self.g_optimizer.step()

            loss['G/loss_id'] = g_loss_id.item()

          # =================================================================================== #
          #                               3. F0_Converter Training                              #
          # =================================================================================== #

            self.P = self.P.train()
            f0_trg_intrp_indx = f0_org_intrp.argmax(2)

            # P forward
            f0_pred = self.P(x_real_org,f0_org_intrp)
            p_loss_id = F.cross_entropy(f0_pred.transpose(1,2),f0_trg_intrp_indx, reduction='mean')

            self.p_optimizer.zero_grad()
            p_loss_id.backward()
            self.p_optimizer.step()
            loss['P/loss_id'] = p_loss_id.item()