Atul-Anand-Jha / Speaker-Identification-Python

Speaker Identification System (upto 100% accuracy); built using Python 2.7 and python_speech_features library
GNU Lesser General Public License v3.0
206 stars 75 forks source link

wav file set is not complete #4

Closed gitrohini closed 5 years ago

gitrohini commented 5 years ago

please give the complete dataset .

Atul-Anand-Jha commented 5 years ago

Hey @gitrohini , thanks that you showed interest here. I know the dataset is not complete. But I have shared samples for you all. I can't share complete dataset, beuase they were custom recorded voices, which I cannot share without concern permission. But, You can try this repo on Another publicly available dataset; voxforge_dataset. You can easily find it on google.

Regards, -Atul

gitrohini commented 5 years ago

Thanks for reply , I tried with voxforge_dataset as well custom dataset .your code is really nice but I have one query for training of custom dataset we need to required 15 files ,Is there any way to minimize training files and also want Accuracy or can we convert custom dataset to voxforge dataset to reducing noise and silence from a recording.

Thanks & Regards -Rohini

On Fri, Nov 30, 2018 at 9:12 PM Atul Anand notifications@github.com wrote:

Hey @gitrohini https://github.com/gitrohini , thanks that you showed interest here. I know the dataset is not complete. But I have shared samples for you all. I can't share complete dataset, beuase they were custom recorded voices, which I cannot share without concern permission. But, You can try this repo on Another publicly available dataset; voxforge_dataset. You can easily find it on google.

Regards, -Atul

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Atul-Anand-Jha/Speaker-Identification-Python/issues/4#issuecomment-443241003, or mute the thread https://github.com/notifications/unsubscribe-auth/ATqrTpME7RE_3aK02f6rJvbWE7pgBQgqks5u0VHvgaJpZM4YmEZt .

-- Thanks & Regards Rohini Gore.

Atul-Anand-Jha commented 5 years ago

hey @gitrohini, You can set the number of training files as per your need. try to adjust in this code section:::::::: if count == 15: in the modeltraining.py (line 42). please, check i have mentioned something about number of files in there.And, You can also notice that I switched from 5 to 15 files now. Hope this helps.

And, VOxforge dataset is highly-precised and clean voice dataset, highly pre-processed to be noise free. You can create one such dataset of your own. Now, you know why my accuracy was better for voxforge(due to noiseless audio). Though, I tried my best to preprocess my custom dataset, and got quite good accuracy. Try to make one for yourself,If you are really interested. You can go though the documentation to know which software tool i used to preprocess our audio signals and how i did it....

Correct me if I didn.t get your second part.

gitrohini commented 5 years ago

Thank you so much it will really help me , I speak about custom dataset (make myself) without any Pre Processing for noise remove ,I Train 16 files each of 4 to 6 second then I test with 6 single word (1 to 2 second ) for all speaker . getting result is [error= 3 total_sample= 18.0 A.P= 83.333 % ] . then I Pre Process my dataset using audacity for removing noise and silence After that I Train and Test same dataset (noiseless) getting result is [error= 5 total_sample= 18.0 A.P= 72.2222222 %] . I am expecting after removing noise It gives better result than previous . How it will Happened ? what should I do? please give me suggestion and correct me if did something wrong.

On Sun, Dec 2, 2018 at 10:45 PM Atul Anand notifications@github.com wrote:

hey @gitrohini https://github.com/gitrohini, You can set the number of training files as per your need. try to adjust in this code section:::::::: if count == 15: in the modeltraining.py (line 42). please, check i have mentioned something about number of files in there.And, You can also notice that I switched from 5 to 15 files now. Hope this helps.

And, VOxforge dataset is highly-precised and clean voice dataset, highly pre-processed to be noise free. You can create one such dataset of your own. Now, you know why my accuracy was better for voxforge(due to noiseless audio). Though, I tried my best to preprocess my custom dataset, and got quite good accuracy. Try to make one for yourself,If you are really interested. You can go though the documentation to know which software tool i used to preprocess our audio signals and how i did it....

Correct me if I didn.t get your second part.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Atul-Anand-Jha/Speaker-Identification-Python/issues/4#issuecomment-443549631, or mute the thread https://github.com/notifications/unsubscribe-auth/ATqrTjh_6K2k0p4THqbgyhDgPYSLdeoeks5u1FglgaJpZM4YmEZt .

-- Thanks & Regards Rohini Gore.

Atul-Anand-Jha commented 5 years ago

See the Error increased from 3 to 5. You must have done something wrong while preprocessing with Audacity. Look precisely on the Waveform of your audio; Keep it in mind that, no useful data should be lost.

shubhamdmce commented 5 years ago

Hi , give me Voxforge dataset download link

ghost commented 5 years ago

Thanks for reply , I tried with voxforge_dataset as well custom dataset .your code is really nice but I have one query for training of custom dataset we need to required 15 files ,Is there any way to minimize training files and also want Accuracy or can we convert custom dataset to voxforge dataset to reducing noise and silence from a recording.

Thanks & Regards -Rohini …

Can you tell me how did you find the database ? which one is the used database?