Closed aaronhsueh0506 closed 2 years ago
By the way, if I point base_dir/ to a non-existent folder, it will cause another error
Encounter some error as shown in the figure below,
This is related to the dataset length. I will try to provide a better error message.
In the command python df/train.py dataset.cfg ~/wav_folder/ ./base_dir/, -Is data_dir a wav folder or an hdf5 folder? (I think is hdf5 folder) -Can base_dir/ not exist? (But we need to give config.ini, so here I enter pertrained_model/ and delete .ckpt)
It should be a folder containing the prepared hdf5 files. At some point it worked without an initialized model base dir, but I haven't tested in a while. You could just copy the config from the pretrained model. I found that the log says dataloader len:0, is this a problem?
Yes, you didn't setup the dataset correctly.
I remove the all 'df.' of each file while import (ex. from df.config import ... -> from config import ...), otherwise it will cause an import error.
Other options would be to set the python path or install df locally (e.g. as editable wheel).
By the way, if I point base_dir/ to a non-existent folder, it will cause another error
A config file was found but is not complete. You are running on an old version. This error is fixed in main.
Hi Rikorose, thanks for your reply.
I found that the log says dataloader len:0, is this a problem?
Yes, you didn't setup the dataset correctly.
Does "setup the dataset incorrectly" and "length of dataset" refer to the same thing? So I think increasing dataset can solve this problem for me. BTW, I guess the error is caused by the seed setting at 42?
I will update the version later.
Thanks,
Since the data loader length is 0, your data set is not correctly prepared or smaller than the batch size. The seeding should not have any effect on errors.
Hi Rikorose,
I follow your suggestion and increase the dataset length. And it works now!
By the way, I really like the way you display the log. I will keep to implement and try this masterpiece.
Thanks,
Hi Rikorose, I am tracing code for mixed signal, but I got lost at some point.
loader.iter_epoch(...) in line 231, df/train.py
self.loader = _FdDataLoader(...) in line 99, pyDF-data/libdfdata/torch_dataloader.py
impl _FdDataLoader{ fn new(...){...} fn get_batch<'py>(...) { Some(batch)=>{...} None =>{...} } in line 263, pyDF-data/src/lib.rs
I'm not sure where to found 'batch', and where to mix the speech and noise. On the other hand, I found a lot of 'seed' in your code. What does this variable mean?
Now I am using your code to run the train.py with DNS-challenge files(only speech and noise). The log says that it takes 4 mins for 100 iterations, is this speed correct?
Thanks,
The data loading stuff is implemented in Rust. E.g. the mixing is done here: https://github.com/Rikorose/DeepFilterNet/blob/main/libDF/src/dataset.rs#L1078
Seed is way to control the randomness. This guarantees that I get the exact same noisy mixtures in a specific epoch, or that the network is initialized in the same way. This often done for reproducibility.
Speech could be correct, this highly depends on the hardware.
Hi Rikorose, Thanks for your reply.
Now I increase the batch size, workers and prefetch,but I didn't see a speed up obviously.
I check the workers is 24 via multiprocessing.cpu_count()
.
Set batch size
to 64, prefetch
from 4 to 10,and so on for config.ini
Now it takes 7 mins for 100 iterations, but the total number of iterations has become half of original. (I think it related with batch size.)
Thanks,
Hi Rikorose, I have encountered this issue twice when I train the model.
2021-12-23 09:48:23 | ERROR | DF | An error has been caught in function '
', process 'MainProcess' (3266460), thread 'MainThread' (140391931664192): Traceback (most recent call last): File "df/train.py", line 428, in
main() └ <function main at 0x7fae84e71430> File "df/train.py", line 150, in main train_loss = run_epoch( └ <function run_epoch at 0x7fae84e770d0>
File "df/train.py", line 281, in run_epoch raise e
File "df/train.py", line 274, in run_epoch clip_gradnorm(model.parameters(), 1.0, error_if_nonfinite=True) │ │ └ <function Module.parameters at 0x7faf84116dc0> │ └ RecursiveScriptModule( │ original_name=DfNet │ (enc): RecursiveScriptModule( │ original_name=Encoder │ (erb_conv0): Recur... └ <function clip_gradnorm at 0x7fae84e435e0>
File "/home/myhsueh/DeepFilterNet/DeepFilterNet-github/DeepFilterNet/df/utils.py", line 211, in clip_gradnorm raise RuntimeError(
RuntimeError: The total norm of order 2.0 for gradients from
parameters
is non-finite, so it cannot be clipped. To disable this error and scale the gradients by the non-finite norm anyway, seterror_if_nonfinite=False
Can I modify 'error_if_nonfinite' from False to True as instructed?
Best regards,
Hi there, yes you could try error_if_nonfinite
not sure though if this works.
I got these issues as well from time to time. I just restarted the training when this happened.
I guess in some cases the gradient in the backward pass becomes NaN. Could be e.g. due to atan2. A fix would be appreciated.
Hi,
I am trying to retrain and I have not encountered Nan at the moment.
By the way, I am tracing the mix_audio_signal
function at the same time,
I found that we apply gain g
for clean_out = & clean * g
,
if let Some(atten_db) = atten_db( ...)
noise *=k
I am a little confused,
atten_db
function do here? Did c
and n
change the values of clean_out
and noise
?snr_db
. But I think g
and k
are not suitable for this snr_db
, do they?Thanks,
I don't understand question 3
Hi,
The third question I want to ask is that we gave snr_db
when using mix_audio_signal
fn mix_audio_signal(
clean: Array2<f32>,
clean_rev: Option<Array2<f32>>,
mut noise: Array2<f32>,
snr_db: f32,
gain_db: f32,
atten_db: Option<f32>,
noise_resample: Option<LpParam>,
) -> Result<(Signal, Signal, Signal)>
And compute the k in
let k = mix_f(clean.view(), noise.view(), snr_db);
I don't know why clean.view() is used instead of clean_out.view().
I think here we have to calculate the "k" of the noise gain.
If we want to satisfy to snr_db
, does clean.view()
need to be changed to clean_out.view()
?
Thanks,
Good point, might be a bug. I will have a look. The mean expectation value of the resulting SNR does not change since gain can be one of 6,0,-6.
Indeed, however, this only affects models with an attenuation limit. By default no attenuation limit is applied. Fixed in 7f2120b.
Hi, Sorry to bother you again, SORRY, I have not fully understood these code. But I want to check,
The mean expectation value of the resulting SNR does not change since gain can be one of 6,0,-6.
Are the SNR and gain(for speech) independent ?
Look from fn mix_audio_signal
,
(i.) Because I think your process is to set an SNR.
(ii.). Calculate k (for noise gain) by original speech and original noise.
(iii.) The speech gain 1
and noise gain k
satisfy SNR.
(iv.) Then choose a speech gain from {-6,0,6} dB
(v.) mixture
is equal to clean_mix + noise (equal to clean g + noise k)
=> if speech gain is 6dB, the SNR of mixture
becomes (SNR+6) dB
=> if speech gain is -6dB, the SNR of mixture
becomes (SNR-6) dB
In my opinion,
(i.) Set an SNR and choose a speech gain from {-6,0,6} dB.
(ii.) Calculate the noise gain k
from "speech with gain" and original noise.
(iii.) The speech gain g
and noise gain k
satisfy SNR.
(iv.) mixture
equals to clean_mix + noise (equal to clean g + noise k)
=> No matter what the SNR is, the SNR of mixture
is SNR dB
These two way get the different k
and SNR.
Does you want to achieve the first result (SNR=SNR+gain_dB)? Thanks,
Hi, I read the paper of "2.5 Data Preprocessing" again.
In paper,
" We mix a clean speech signal with up to 5 noise signals at SNR of {-5,0,5,10,20,40}." " To further increase variability, we augmentation speech as well as noise signals with .... , random gains of {-6,0,6} dB."
So I think your purpose is same as increase SNR by random gains (the way 1 above)?
Are the SNR and gain(for speech) independent ?
They should be independent. Gain should only modify the overall energy (i.e. loudness). I will have a look and maybe add some tests.
Hi, I am tracing the PyTorch code to construct the model, and I have some confused some code might not match the figure.
In df/deepfilternet.py
class Encoder(nn.Module):
def __init__(self):
...
def forward(...):
...
e3 = self.erb_conv3(e2) # [B, C*4, T, F/4]
c0 = self.df_conv0(feat_spec) # [B, C, T, Fc]
c1 = self.df_conv1(c0) # [B, C*2, T, Fc]
cemb = c1.permute(2, 0, 1, 3).reshape(t, b, -1) # [T, B, C * Fc/4]
cemb = self.df_fc_emb(cemb) # [T, B, C * F/4]
emb = e3.permute(2, 0, 1, 3).reshape(t, b, -1) # [T, B, C * F/4]
emb = emb + cemb
emb, _ = self.emb_gru(emb)
...
The e0,e1,e2,e3,c0,c1
look like that I marked the part in red on figure.
The cemb
is the output of GLinear in DF Net. (Is this correct?)
I am not sure why here emb = emb + cemb
, is there a GLinear layer before self.emb_gru (GGRU in Encoder)?
Thanks,
Hi Rikorose,
I am trying to figure out the model architecture, and I have traced the code in modules.py I think the Fig.2 in paper maybe add some red connected-line as below, which satisfies to your code.
Other questions, I check the size for input features, noisy: [batch,1,300,481], feat_erb: [batch,1,300,32], feat_spec: [batch,1,300,96]
Thanks for your reply. Have a nice day!
I am not sure why here emb = emb + cemb, is there a GLinear layer before self.emb_gru (GGRU in Encoder)?
Here:
cemb = self.df_fc_emb(cemb) # [T, B, C * F/4]
Fully connected is not a perfect name since it is grouped. GLinear also contains a skipt connection (i.e. emb = emb+cemb).
I think '300' is time axis, and 'lookahead' defaults as 2. (Is this two seconds?)
2 corresponds to lookahead in time steps which depends on the FFT hop size, e.g. 2*10ms=20ms.
Is the noisy is spectrum? what is different between 'noisy' and 'feat_spec'?
feat_spec is basically the noisy normalized spectrum. This is done in the rust source code.
I think the Fig.2 in paper maybe add some red connected-line as below, which satisfies to your code.
No I don't think this satisfies the code. Could you make an argument for that?
Hi, Here is my visualization of the PyTorch code, with black background blocks following the code divisions. I think there are some differences, so I ask if I need to add the red line.
Another point, I have a files including the broken glass noise, I follow the #38 to revise the DF_gamma to 0.3 and this approach make some help. what is this parameter influence?
If I want to do real time inference, need I queue a buffer and update the frame, inference the model every time (but only get the last frame output)? Because I think the enhance.py is worked offline.
Thanks again.
Ah I see what you mean. True, there is another interconnection. I will think about how to include this in the figure while still keeping the figure simple and clear. Also the linear layer for alpha is not shown for simplicity.
Another point, I have a files including the broken glass noise, I follow the #38 to revise the DF_gamma to 0.3 and this approach make some help. what is this parameter influence?
It's a compression factor similar to logarithmic compression to model loudness perception. E.g. PercepNet has a reference to why 0.3 was chosen.
Wrt. real-time: You need to setup a loop and call the model on each time step. Here is a python example of a previous project. Also take a look at this PR #13 (e.g. bin/df-tract.rs line 277 following). Here, the whole processing loop is implemented in Rust. Note that there is a bug somewhere in the DF component (i.e. produces worse results than in the python implementation). Overall the buffer handling might be the trickiest. For the most part, I used tract for this.
Hi, Thank you for your prompt response. Ok, I will try to do real-time and maybe transfer to Keras.
Best Regards,
@aaronhsueh0506 Your discussion here was educating, thanks.
Ok, I will try to do real-time and maybe transfer to Keras.
Jst wanted to know whether you have done the real-time and if you were able to reproduce the offline results. If yes, any plans of making it public ?
Hi,
You can save GRU state and give it for next loop, so you can reduce your frames.
Hi Rikorose,
Sorry to bother you again, I try to generate data and train the model according to the training part.
I generated the
training_set.txt
(just select 10 files for test.) for speech and made the hdf5.(and so on for noise). Usepython df/prepare_data.py --sr 48000 speech training_set.txt TRAIN_SET_SPEECH.hdf5
.Generate the
dataset.cfg
as shown below,Encounter some error as shown in the figure below,
In addition, I have some questions:
python df/train.py dataset.cfg ~/wav_folder/ ./base_dir/
, -Is data_dir a wav folder or an hdf5 folder? (I think is hdf5 folder) -Canbase_dir/
not exist? (But we need to giveconfig.ini
, so here I enterpertrained_model/
and delete .ckpt)ex. from df.config import ... -> from config import ...
), otherwise it will cause an import error.Thanks,