Question for training - Githubissues

aaronhsueh0506 commented 2 years ago

Hi Rikorose,

Sorry to bother you again, I try to generate data and train the model according to the training part.

I generated the training_set.txt (just select 10 files for test.) for speech and made the hdf5.(and so on for noise). Use python df/prepare_data.py --sr 48000 speech training_set.txt TRAIN_SET_SPEECH.hdf5.

~/DeepFilterNet/wav/dataset/oblomov_s009036.wav
~/DeepFilterNet/wav/dataset/oblomov_s009040.wav  
~/DeepFilterNet/wav/dataset/oblomov_s009033.wav     
~/DeepFilterNet/wav/dataset/oblomov_s009037.wav    
~/DeepFilterNet/wav/dataset/oblomov_s009041.wav  
~/DeepFilterNet/wav/dataset/oblomov_s009034.wav    
~/DeepFilterNet/wav/dataset/oblomov_s009038.wav     
~/DeepFilterNet/wav/dataset/oblomov_s009042.wav  
~/DeepFilterNet/wav/dataset/oblomov_s009035.wav     
~/DeepFilterNet/wav/dataset/oblomov_s009039.wav

Generate the dataset.cfg as shown below,

{
 "train": [
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
      1.0
    ],
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
      1.0
    ]
  ],
  "valid": [
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
      0.2
    ],
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
      0.2
    ]
  ],
  "test": [
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_SPEECH.hdf5",
      0.2
    ],
    [
      "~/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/hdf5/TRAIN_SET_NOISE.hdf5",
      0.2
    ]
  ]
}

Encounter some error as shown in the figure below,

In addition, I have some questions:

In the command python df/train.py dataset.cfg ~/wav_folder/ ./base_dir/, -Is data_dir a wav folder or an hdf5 folder? (I think is hdf5 folder) -Can base_dir/ not exist? (But we need to give config.ini, so here I enter pertrained_model/ and delete .ckpt)
I found that the log says dataloader len:0, is this a problem?
I remove the all 'df.' of each file while import (ex. from df.config import ... -> from config import ...), otherwise it will cause an import error.

Thanks,

aaronhsueh0506 commented 2 years ago

By the way, if I point base_dir/ to a non-existent folder, it will cause another error

Rikorose commented 2 years ago

Encounter some error as shown in the figure below,

This is related to the dataset length. I will try to provide a better error message.

In the command python df/train.py dataset.cfg ~/wav_folder/ ./base_dir/,
-Is data_dir a wav folder or an hdf5 folder? (I think is hdf5 folder)
-Can base_dir/ not exist? (But we need to give config.ini, so here I enter pertrained_model/ and delete .ckpt)
It should be a folder containing the prepared hdf5 files. At some point it worked without an initialized model base dir, but I haven't tested in a while. You could just copy the config from the pretrained model. I found that the log says dataloader len:0, is this a problem?

Yes, you didn't setup the dataset correctly.

I remove the all 'df.' of each file while import (ex. from df.config import ... -> from config import ...), otherwise it will cause an import error.

Other options would be to set the python path or install df locally (e.g. as editable wheel).

By the way, if I point base_dir/ to a non-existent folder, it will cause another error

A config file was found but is not complete. You are running on an old version. This error is fixed in main.

aaronhsueh0506 commented 2 years ago

Hi Rikorose, thanks for your reply.

I found that the log says dataloader len:0, is this a problem?

Yes, you didn't setup the dataset correctly.

Does "setup the dataset incorrectly" and "length of dataset" refer to the same thing? So I think increasing dataset can solve this problem for me. BTW, I guess the error is caused by the seed setting at 42?

I will update the version later.

Thanks,

Rikorose commented 2 years ago

Since the data loader length is 0, your data set is not correctly prepared or smaller than the batch size. The seeding should not have any effect on errors.

aaronhsueh0506 commented 2 years ago

Hi Rikorose,

I follow your suggestion and increase the dataset length. And it works now!

By the way, I really like the way you display the log. I will keep to implement and try this masterpiece.

Thanks,

aaronhsueh0506 commented 2 years ago

Hi Rikorose, I am tracing code for mixed signal, but I got lost at some point.

loader.iter_epoch(...) in line 231, df/train.py

self.loader = _FdDataLoader(...) in line 99, pyDF-data/libdfdata/torch_dataloader.py

impl _FdDataLoader{ fn new(...){...} fn get_batch<'py>(...) { Some(batch)=>{...} None =>{...} } in line 263, pyDF-data/src/lib.rs

I'm not sure where to found 'batch', and where to mix the speech and noise. On the other hand, I found a lot of 'seed' in your code. What does this variable mean?

Now I am using your code to run the train.py with DNS-challenge files(only speech and noise). The log says that it takes 4 mins for 100 iterations, is this speed correct?

Thanks,

Rikorose commented 2 years ago

The data loading stuff is implemented in Rust. E.g. the mixing is done here: https://github.com/Rikorose/DeepFilterNet/blob/main/libDF/src/dataset.rs#L1078

Seed is way to control the randomness. This guarantees that I get the exact same noisy mixtures in a specific epoch, or that the network is initialized in the same way. This often done for reproducibility.

Speech could be correct, this highly depends on the hardware.

aaronhsueh0506 commented 2 years ago

Hi Rikorose, Thanks for your reply.

Now I increase the batch size, workers and prefetch,but I didn't see a speed up obviously.

I check the workers is 24 via multiprocessing.cpu_count(). Set batch size to 64, prefetch from 4 to 10,and so on for config.ini

Now it takes 7 mins for 100 iterations, but the total number of iterations has become half of original. (I think it related with batch size.)

Thanks,

aaronhsueh0506 commented 2 years ago

Hi Rikorose, I have encountered this issue twice when I train the model.

2021-12-23 09:48:23 | ERROR | DF | An error has been caught in function '', process 'MainProcess' (3266460), thread 'MainThread' (140391931664192): Traceback (most recent call last):

File "df/train.py", line 428, in main() └ <function main at 0x7fae84e71430>

File "df/train.py", line 150, in main train_loss = run_epoch( └ <function run_epoch at 0x7fae84e770d0>

File "df/train.py", line 281, in run_epoch raise e

File "df/train.py", line 274, in run_epoch clip_gradnorm(model.parameters(), 1.0, error_if_nonfinite=True) │ │ └ <function Module.parameters at 0x7faf84116dc0> │ └ RecursiveScriptModule( │ original_name=DfNet │ (enc): RecursiveScriptModule( │ original_name=Encoder │ (erb_conv0): Recur... └ <function clip_gradnorm at 0x7fae84e435e0>

File "/home/myhsueh/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/df/utils.py", line 211, in clip_gradnorm raise RuntimeError(

RuntimeError: The total norm of order 2.0 for gradients from parameters is non-finite, so it cannot be clipped. To disable this error and scale the gradients by the non-finite norm anyway, set error_if_nonfinite=False

Can I modify 'error_if_nonfinite' from False to True as instructed?

Best regards,

Rikorose commented 2 years ago

Hi there, yes you could try error_if_nonfinite not sure though if this works. I got these issues as well from time to time. I just restarted the training when this happened.

I guess in some cases the gradient in the backward pass becomes NaN. Could be e.g. due to atan2. A fix would be appreciated.

aaronhsueh0506 commented 2 years ago

Hi, I am trying to retrain and I have not encountered Nan at the moment. By the way, I am tracing the mix_audio_signal function at the same time, I found that we apply gain g for clean_out = & clean * g, if let Some(atten_db) = atten_db( ...) noise *=k

I am a little confused,

What does the atten_db function do here? Did c and n change the values of clean_out and noise?
The paper says that the clean speech is mixed with up to 5 noise signals under a signal-to-noise ratio of {-5,0,5,10,20,40}. Is the signal-to-noise ratio of each noise different?
In the reference parameters, we have given snr_db. But I think g and k are not suitable for this snr_db, do they?

Thanks,

Rikorose commented 2 years ago

atten_db (e.g. 10dB) limits the algorithm attenuation by providing a noisy training target that has 10dB less noise than the noisy input. You network will learn to not remove all noise, but only 10dB.
The SNR is computed over all noises. The different noises may have different energies though.

I don't understand question 3

aaronhsueh0506 commented 2 years ago

Hi, The third question I want to ask is that we gave snr_db when using mix_audio_signal

fn mix_audio_signal(
    clean: Array2<f32>,
    clean_rev: Option<Array2<f32>>,
    mut noise: Array2<f32>,
    snr_db: f32,
    gain_db: f32,
    atten_db: Option<f32>,
    noise_resample: Option<LpParam>,
) -> Result<(Signal, Signal, Signal)>

And compute the k in let k = mix_f(clean.view(), noise.view(), snr_db); I don't know why clean.view() is used instead of clean_out.view(). I think here we have to calculate the "k" of the noise gain. If we want to satisfy to snr_db, does clean.view() need to be changed to clean_out.view()?

Thanks,

Rikorose commented 2 years ago

Good point, might be a bug. I will have a look. The mean expectation value of the resulting SNR does not change since gain can be one of 6,0,-6.

Indeed, however, this only affects models with an attenuation limit. By default no attenuation limit is applied. Fixed in 7f2120b.

aaronhsueh0506 commented 2 years ago

Hi, Sorry to bother you again, SORRY, I have not fully understood these code. But I want to check,

The mean expectation value of the resulting SNR does not change since gain can be one of 6,0,-6.

Are the SNR and gain(for speech) independent ?

Look from fn mix_audio_signal, (i.) Because I think your process is to set an SNR. (ii.). Calculate k (for noise gain) by original speech and original noise. (iii.) The speech gain 1 and noise gain k satisfy SNR. (iv.) Then choose a speech gain from {-6,0,6} dB (v.) mixture is equal to clean_mix + noise (equal to clean g + noise k) => if speech gain is 6dB, the SNR of mixture becomes (SNR+6) dB => if speech gain is -6dB, the SNR of mixture becomes (SNR-6) dB
In my opinion, (i.) Set an SNR and choose a speech gain from {-6,0,6} dB. (ii.) Calculate the noise gain k from "speech with gain" and original noise. (iii.) The speech gain g and noise gain k satisfy SNR. (iv.) mixture equals to clean_mix + noise (equal to clean g + noise k) => No matter what the SNR is, the SNR of mixture is SNR dB

These two way get the different k and SNR.

Does you want to achieve the first result (SNR=SNR+gain_dB)? Thanks,

aaronhsueh0506 commented 2 years ago

Hi, I read the paper of "2.5 Data Preprocessing" again.

In paper,

" We mix a clean speech signal with up to 5 noise signals at SNR of {-5,0,5,10,20,40}." " To further increase variability, we augmentation speech as well as noise signals with .... , random gains of {-6,0,6} dB."

So I think your purpose is same as increase SNR by random gains (the way 1 above)?

Rikorose commented 2 years ago

Are the SNR and gain(for speech) independent ?

They should be independent. Gain should only modify the overall energy (i.e. loudness). I will have a look and maybe add some tests.

aaronhsueh0506 commented 2 years ago

Hi, I am tracing the PyTorch code to construct the model, and I have some confused some code might not match the figure.

In df/deepfilternet.py

class Encoder(nn.Module):
    def __init__(self):
        ...

    def forward(...):
        ...
        e3 = self.erb_conv3(e2)  # [B, C*4, T, F/4]
        c0 = self.df_conv0(feat_spec)  # [B, C, T, Fc]
        c1 = self.df_conv1(c0)  # [B, C*2, T, Fc]
        cemb = c1.permute(2, 0, 1, 3).reshape(t, b, -1)  # [T, B, C * Fc/4]
        cemb = self.df_fc_emb(cemb)  # [T, B, C * F/4]
        emb = e3.permute(2, 0, 1, 3).reshape(t, b, -1)  # [T, B, C * F/4]
        emb = emb + cemb
        emb, _ = self.emb_gru(emb)
        ...

The e0,e1,e2,e3,c0,c1 look like that I marked the part in red on figure. The cemb is the output of GLinear in DF Net. (Is this correct?)

I am not sure why here emb = emb + cemb, is there a GLinear layer before self.emb_gru (GGRU in Encoder)?

Thanks,

aaronhsueh0506 commented 2 years ago

Hi Rikorose,

I am trying to figure out the model architecture, and I have traced the code in modules.py I think the Fig.2 in paper maybe add some red connected-line as below, which satisfies to your code. dfn

Other questions, I check the size for input features, noisy: [batch,1,300,481], feat_erb: [batch,1,300,32], feat_spec: [batch,1,300,96]

I think '300' is time axis, and 'lookahead' defaults as 2. (Is this two seconds?) ~~2. [960 samples as a frame with a size of 480 hop size. After fft, we will get 99(frames)*480(complex) spectrum, and we can get 300 frames with lookahead as 2 seconds]?~~
Is the noisy is spectrum? what is different between 'noisy' and 'feat_spec'?

Thanks for your reply. Have a nice day!

Rikorose commented 2 years ago

I am not sure why here emb = emb + cemb, is there a GLinear layer before self.emb_gru (GGRU in Encoder)?

Here:

cemb = self.df_fc_emb(cemb)  # [T, B, C * F/4]

Fully connected is not a perfect name since it is grouped. GLinear also contains a skipt connection (i.e. emb = emb+cemb).

I think '300' is time axis, and 'lookahead' defaults as 2. (Is this two seconds?)

2 corresponds to lookahead in time steps which depends on the FFT hop size, e.g. 2*10ms=20ms.

Is the noisy is spectrum? what is different between 'noisy' and 'feat_spec'?

feat_spec is basically the noisy normalized spectrum. This is done in the rust source code.

I think the Fig.2 in paper maybe add some red connected-line as below, which satisfies to your code.

No I don't think this satisfies the code. Could you make an argument for that?

aaronhsueh0506 commented 2 years ago

Hi, Here is my visualization of the PyTorch code, with black background blocks following the code divisions. I think there are some differences, so I ask if I need to add the red line.

Another point, I have a files including the broken glass noise, I follow the #38 to revise the DF_gamma to 0.3 and this approach make some help. what is this parameter influence?

If I want to do real time inference, need I queue a buffer and update the frame, inference the model every time (but only get the last frame output)? Because I think the enhance.py is worked offline.

Thanks again.

Rikorose commented 2 years ago

Ah I see what you mean. True, there is another interconnection. I will think about how to include this in the figure while still keeping the figure simple and clear. Also the linear layer for alpha is not shown for simplicity.

Another point, I have a files including the broken glass noise, I follow the #38 to revise the DF_gamma to 0.3 and this approach make some help. what is this parameter influence?

It's a compression factor similar to logarithmic compression to model loudness perception. E.g. PercepNet has a reference to why 0.3 was chosen.

Wrt. real-time: You need to setup a loop and call the model on each time step. Here is a python example of a previous project. Also take a look at this PR #13 (e.g. bin/df-tract.rs line 277 following). Here, the whole processing loop is implemented in Rust. Note that there is a bug somewhere in the DF component (i.e. produces worse results than in the python implementation). Overall the buffer handling might be the trickiest. For the most part, I used tract for this.

aaronhsueh0506 commented 2 years ago

Hi, Thank you for your prompt response. Ok, I will try to do real-time and maybe transfer to Keras.

Best Regards,

stonelazy commented 2 years ago

@aaronhsueh0506 Your discussion here was educating, thanks.

Ok, I will try to do real-time and maybe transfer to Keras.

Jst wanted to know whether you have done the real-time and if you were able to reproduce the offline results. If yes, any plans of making it public ?

aaronhsueh0506 commented 2 years ago

Hi,

You can save GRU state and give it for next loop, so you can reduce your frames.

Rikorose / DeepFilterNet

Question for training #31