Rongjiehuang / Multi-Singer

PyTorch Implementation of Multi-Singer (ACM-MM'21)
MIT License
137 stars 21 forks source link

Missing Code? #3

Closed Coice closed 2 years ago

Coice commented 2 years ago

Hello,

Generator1 requires two parameters for forward (x and c) but in the training step only the mel features are used. (No noise, or other features.)

Is this correct?

    def _train_step(self, batch):
        """Train model one step."""
        # parse batch
        x = []

        x.append(batch['feats'])
        embed = batch['embed'].to(self.device)

        y = batch['audios'].to(self.device)
        x = tuple([x_.to(self.device) for x_ in x])
        y_ = self.model["generator"](*x).to(self.device)

Thank you for your time.

SunMail-hub commented 2 years ago

Hi, Yes, I forgot to add one line when cleaning up the code, and it should be:

 def _train_step(self, batch):
        """Train model one step."""
        # parse batch
        x = []
        x.append(batch['noise'])
        x.append(batch['feats'])
Coice commented 2 years ago

@SunMail-hub thanks for your response.

The eval code seems to have the correct logic, but turning on F0 features or chroma would cause an error:

        """Evaluate model one step."""
        # parse batch
        x = []

        if self.config['use_noise_input']:
            x.append(batch['noise'])
        if self.config['use_f0']:
            x.append(batch['f0_origins'])
        if self.config['use_chroma']:
            x.append(batch['chromas'])
        x.append(batch['feats'])
        y = batch['audios'].to(self.device)
        x = tuple([x_.to(self.device) for x_ in x])
        embed = batch['embed'].to(self.device)
        y_ = self.model["generator"](*x).to(self.device)

Were you concatenating the extra features (f0, chromas, etc) to the mel features to make it one vector for the c parameter or was there some other modification to the Generator1?

Again thanks for your time.

SunMail-hub commented 2 years ago

Hi @coice, you could see the settings in config file as follows:

use_f0: false
use_chroma: false
use_noise_input: true

Therefore we use c and noise as model input without extra features.