Dataset - Githubissues

dreamibor commented 4 years ago

Hi, is there a way to create the training dataset? I mean the approach that you take to get seperate speech and noise data?

AkojimaSLP commented 4 years ago

Hi, I'd like to appreciate your question.

1) Way to create training data Training data is generated by choosing from ./dataset/train/noise/ and ./dataset/train/speech/* respectively. The 2 audio is simulated by chosen SNR and revereberent time randomly. In script "train.py", the simulated speech is generated without writing file in HDD(The more training data file, HDD disc capacity is insufficient).

2) Separete speech and noise data As you know, this approach needs parallel corpus(noise and speech). Research often uses CHiME corpus.

Regards,

dreamibor commented 4 years ago

Thank you for your response! I think your answer solved my problem and I will close the issue.

AkojimaSLP / Neural-mask-estimation

Dataset #1