davidmartinrius / speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.
MIT License
209 stars 18 forks source link

get_gender will always return 'male' even when it's probably 'mixed' #1

Closed rsxdalv closed 5 months ago

rsxdalv commented 9 months ago

https://github.com/davidmartinrius/speech-dataset-generator/blob/c56e68f6bb027d49dee46831680c76fc4213956a/main.py#L164

davidmartinrius commented 9 months ago

Thanks for reporting, I'll check it out and I will let you know when solved

davidmartinrius commented 9 months ago

I will try this https://github.com/resemble-ai/Resemblyzer/blob/master/demo04_clustering.py

Currently the code is using https://github.com/ina-foss/inaSpeechSegmenter to detect the gender. But probably it is not accurate enough and it is not clear how to fine tune

davidmartinrius commented 9 months ago

By the way, if you wanted you can try the branch https://github.com/ina-foss/inaSpeechSegmenter/tree/mf_proba

They commented it has the score of the gender prediction

Here is the code if you would like to try it. https://github.com/ina-foss/inaSpeechSegmenter/compare/master...mf_proba

I could try to integrate this enhancement, but before that: would be enough for you to manage the gender score prediction? It is the best I can do by now. I have been searching for a project that actually gets the gender but there nothing better ready to use like inaSpeechSegmenter, at least I did not find it.

If you know other projects to predict the gender that work I could do a new integration.

davidmartinrius commented 9 months ago

Closed due inactivity. Feel free to reopen

rsxdalv commented 9 months ago

Thanks for the many options! I am not sure which would be the best, honestly. I was mainly concerned that if you had a sample with multiple speakers, the dataset would register it as a male, potentially making the training have an issue.

davidmartinrius commented 9 months ago

The speakers are identified and separated into different fragments. And the gender must be detected correctly as in each fragment there is only one speaker.

You can try the ./assets/example_audio_1.mp3

This is an interview between a male and a female provided in this project.

You can try it in speech_dataset_generator_example.ipynb

Please, tell me if it worked for you.

rsxdalv commented 9 months ago

Ok, I will try it today, that's a good clarification, thank you! I'm thinking that if there was an assertion that there's only 1 speaker it would make it clear that there's no bug. I didn't realize that the function was only meant to be used on a single-speaker segment

davidmartinrius commented 9 months ago

By the way, keep in mind when using the demo apply deepfilternet or deepfilternet with resembleai to example audio, otherwise it may be discarded because not enough MOS quality. The .ipynb is still in development and I have to complete it next week.

rsxdalv commented 9 months ago

Ah I see, I might postpone until then if I don't have another setup running before then.

rsxdalv commented 8 months ago

The current state of colab is this:

ERROR: Cannot install tensorflow, tensorflow[and-cuda]==2.16.1 and whisperx because these package versions have conflicting dependencies.

The conflict is caused by:
    tensorflow[and-cuda] 2.16.1 depends on nvidia-cuda-nvrtc-cu12==12.3.107; extra == "and-cuda"
    nvidia-cudnn-cu12 8.9.7.29 depends on nvidia-cuda-nvrtc-cu12
    torch 2.1.1 depends on nvidia-cuda-nvrtc-cu12==12.1.105; platform_system == "Linux" and platform_machine == "x86_64"

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts
davidmartinrius commented 8 months ago

Thanks @rsxdalv there is an external package that does not have a specific version for tensorflow and it updates itself to the latest version. I have to clone that repo and make several modifications... It is not the first time that crashes because of that. This bug eventually makes the repository unusable...

rsxdalv commented 8 months ago

I see, I'm actually facing similar problems within my dependencies. There's another solution as well, but requires a more customized setup - install the package with --no-deps and then add the deps yourself. But this is only worth it when you can't change the original package for whatever reason.

On Thu, Mar 28, 2024, 1:27 AM David Martin Rius @.***> wrote:

Thanks @rsxdalv https://github.com/rsxdalv there is an external package that does not have a specific version for tensorflow and it updates itself to the latest version. I have to clone that repo and make several modifications... It is not the first time that crashes because of that. This bug eventually makes the repository unusable...

— Reply to this email directly, view it on GitHub https://github.com/davidmartinrius/speech-dataset-generator/issues/1#issuecomment-2024148016, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTRXI4QQQXKMGA4XVPE3WLY2NBT7AVCNFSM6AAAAABDXITEBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMRUGE2DQMBRGY . You are receiving this because you were mentioned.Message ID: @.***>

davidmartinrius commented 8 months ago

I updated the requirements.txt and setup.py

Please, could you try it and confirm if it worked?

rsxdalv commented 8 months ago

Thanks for the update! In the notebook (on colab) I'm getting the typical python issues with package resolution:

Traceback (most recent call last):
  File "/content/speech-dataset-generator/speech_dataset_generator/main.py", line 3, in <module>
    from speech_dataset_generator.audio_processor.audio_processor import process_audio_files, get_local_audio_files, get_youtube_audio_files, get_librivox_audio_files, get_tedtalks_audio_files
ModuleNotFoundError: No module named 'speech_dataset_generator

So I wanted to ask - do you run the project with the notebook or mostly otherwise?

davidmartinrius commented 8 months ago

I am running the project on my local computer mostly. The notebook is not finished yet. It is a pending task. I really would like the close one version for the notebook. It is just that I have little time. If you wanted you can create a PR with the bugfix. It will take me several days to create a new commit...

davidmartinrius commented 8 months ago

Thanks for the update! In the notebook (on colab) I'm getting the typical python issues with package resolution:

Traceback (most recent call last):
  File "/content/speech-dataset-generator/speech_dataset_generator/main.py", line 3, in <module>
    from speech_dataset_generator.audio_processor.audio_processor import process_audio_files, get_local_audio_files, get_youtube_audio_files, get_librivox_audio_files, get_tedtalks_audio_files
ModuleNotFoundError: No module named 'speech_dataset_generator

So I wanted to ask - do you run the project with the notebook or mostly otherwise?

Another user reported the same error. I provided him a solution and told me that it worked.

Have you tried this in the notebook? export PYTHONPATH=/path/to/your/speech-dataset-generator:$PYTHONPATH

rsxdalv commented 8 months ago

I adapted it:

https://github.com/davidmartinrius/speech-dataset-generator/pull/10

davidmartinrius commented 8 months ago

I would like to ask you for advice on what a good graphical interface with gradio would be like. I mean what functions and how to structure it. If it were up to me I would integrate it into your project, but perhaps it would be complicated to deal with the dependencies. How would you do it?

rsxdalv commented 8 months ago

As for integration, this is what I'm looking at: My first step was to see if I can get the project to run. Now what I am looking at next is how portable it really is. Currently my project uses a single virtual environment where all the dependencies live, which allows for a single python runtime to control it. Hence, it is important that the different dependencies can be flexible, avoiding == dependencies. (Side note - in JS it's the exact opposite since JS has no issues installing 20 different versions of the same package typically). Therefore, right now I'd be happy if there was a subset that I could use, without requiring tensorflow and other massive installs.

Now on to the GUI side: I'm not a fan of gradio GUI. I saw that they did many improvements to the latest version, so I think they did hear people out, but I really think gradio is a GUI for AI not humans. The project is highly integrated which isn't useful to me as a React developer. Even as an Angular developer, which is a quite integrated framework, I think it's crossing a few boundaries. What this means is that you often find impossible problems requiring impossible solutions.

For example - let's say you have a audio splitting tab and a history tab. Now you would like that cool button that SD webui has where you can grab all the parameters from history to current audio splitting tab. In Gradio, this will require a button that accepts the history as an input and the parameters as an output. There are no references to outputs, other than the variables you declare. Therefore you will need to have a declaration file where you make these parameter inputs and then use them as inputs in your audio splitting tab and specify them as outputs in your history tab. (That's what I had to do). This will result in a split codebase where the attributes of your UI are divided between the actual UI 'htmlish gradio' and the declaration file, as well as immediately require multiple files or risk a bad dependency graph or even circular dependencies (when tab A can send to tab B and tab B can send to tab A). Now let's say you had 2 different kinds of splitting, but you want to show both of them in the history to avoid having 2 different history tabs. Sure, but what about the button from before? You need 2 buttons now. One for 'send to A splitter' one for 'send to B splitter'. Why? Because gradio buttons have a fixed functionality. When you click it, it will send something to outputs. And that's when the workarounds start to kick in. If you wanted a single button that just said 'let me do this thing again', what you can do is declare the list of outputs to be a dictionary rather than a list. In that case you have more freedom when you return each output, but you still have a function that has all of splitter A and splitter B parameters. You can do some python magic to 'make it disappear', and you'll start making your own framework on top of gradio. Ok, now you have a functioning history tab. What about showing a variable number of outputs, let's say 1-10? You have to declare all of them ahead of time. Then, you have these 'preallocated buckets' that you manage, hiding them when necessary and returning them with the function that has 10 * return values. Finally, you'd like to include some quick information about the current parameters. Such as 'Estimated time for processing'. This is going to be a function of all of the parameters and return just a string. However, the way to implement it is to add a change handler to each and every one of the input components. The change handler has to declare all of the inputs, outputs and function to process them. You can do some refactoring here, like making an array of input components and then iterating over the array with the same apply change handler logic to them. Then, in the end, you could have a gradio interface with a thousand input components and discover that it is a bit laggy.

Recently, I resign gradio to be a backend that also has a simple automatically generated UI. With React or any other proven front end (nothing wrong with vanilla JS) for the GUI.

rsxdalv commented 8 months ago

I don't see tensorflow in the requirements.txt right now. May I ask how integral is it, could it be something that only those who really need it could install while using the rest of the project?

davidmartinrius commented 8 months ago

As for integration, this is what I'm looking at: My first step was to see if I can get the project to run. Now what I am looking at next is how portable it really is. Currently my project uses a single virtual environment where all the dependencies live, which allows for a single python runtime to control it. Hence, it is important that the different dependencies can be flexible, avoiding == dependencies. (Side note - in JS it's the exact opposite since JS has no issues installing 20 different versions of the same package typically). Therefore, right now I'd be happy if there was a subset that I could use, without requiring tensorflow and other massive installs.

Now on to the GUI side: I'm not a fan of gradio GUI. I saw that they did many improvements to the latest version, so I think they did hear people out, but I really think gradio is a GUI for AI not humans. The project is highly integrated which isn't useful to me as a React developer. Even as an Angular developer, which is a quite integrated framework, I think it's crossing a few boundaries. What this means is that you often find impossible problems requiring impossible solutions.

For example - let's say you have a audio splitting tab and a history tab. Now you would like that cool button that SD webui has where you can grab all the parameters from history to current audio splitting tab. In Gradio, this will require a button that accepts the history as an input and the parameters as an output. There are no references to outputs, other than the variables you declare. Therefore you will need to have a declaration file where you make these parameter inputs and then use them as inputs in your audio splitting tab and specify them as outputs in your history tab. (That's what I had to do). This will result in a split codebase where the attributes of your UI are divided between the actual UI 'htmlish gradio' and the declaration file, as well as immediately require multiple files or risk a bad dependency graph or even circular dependencies (when tab A can send to tab B and tab B can send to tab A). Now let's say you had 2 different kinds of splitting, but you want to show both of them in the history to avoid having 2 different history tabs. Sure, but what about the button from before? You need 2 buttons now. One for 'send to A splitter' one for 'send to B splitter'. Why? Because gradio buttons have a fixed functionality. When you click it, it will send something to outputs. And that's when the workarounds start to kick in. If you wanted a single button that just said 'let me do this thing again', what you can do is declare the list of outputs to be a dictionary rather than a list. In that case you have more freedom when you return each output, but you still have a function that has all of splitter A and splitter B parameters. You can do some python magic to 'make it disappear', and you'll start making your own framework on top of gradio. Ok, now you have a functioning history tab. What about showing a variable number of outputs, let's say 1-10? You have to declare all of them ahead of time. Then, you have these 'preallocated buckets' that you manage, hiding them when necessary and returning them with the function that has 10 * return values. Finally, you'd like to include some quick information about the current parameters. Such as 'Estimated time for processing'. This is going to be a function of all of the parameters and return just a string. However, the way to implement it is to add a change handler to each and every one of the input components. The change handler has to declare all of the inputs, outputs and function to process them. You can do some refactoring here, like making an array of input components and then iterating over the array with the same apply change handler logic to them. Then, in the end, you could have a gradio interface with a thousand input components and discover that it is a bit laggy.

Recently, I resign gradio to be a backend that also has a simple automatically generated UI. With React or any other proven front end (nothing wrong with vanilla JS) for the GUI.

Ok, if gradio is not the appropiate tool I can develop an UI with react such as next.js or raw reactjs + fastapi or django. Actually, I have more experiencr with React, Vue and Angular than Gradio. I didn't know this kind of limitations on Gradio

davidmartinrius commented 8 months ago

I don't see tensorflow in the requirements.txt right now. May I ask how integral is it, could it be something that only those who really need it could install while using the rest of the project?

I removed tensorflow[and-cuda]==2.16 because it had conflicts with pytorch.

The project still uses tensorflow but from another package, inaSpeechSegmenter. It uses the latest version of tensorflow, but without cuda

rsxdalv commented 8 months ago

That's good news! What about the other packages, did you pip freeze to get the exact versions or are there some hard requirements?

rsxdalv commented 8 months ago

Also, I get this when I try to install, have you installed it as a package before?

pip install git+https://github.com/davidmartinrius/speech-dataset-generator.git --dry-run
Collecting git+https://github.com/davidmartinrius/speech-dataset-generator.git
  Cloning https://github.com/davidmartinrius/speech-dataset-generator.git to c:\users\admin\appdata\local\temp\pip-req-build-nwtiyauq
  Running command git clone --filter=blob:none --quiet https://github.com/davidmartinrius/speech-dataset-generator.git 'C:\Users\admin\AppData\Local\Temp\pip-req-build-nwtiyauq'
  Resolved https://github.com/davidmartinrius/speech-dataset-generator.git to commit 113b16774c9a6f45c771fc75e283a98a692e602e
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [8 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "C:\Users\admin\AppData\Local\Temp\pip-req-build-nwtiyauq\setup.py", line 10, in <module>
          long_description=open('README.md').read(),  # Add a README.md file for a detailed description
        File "C:\Users\admin\Desktop\one-click-installers-tts-main\installer_files\env\lib\encodings\cp1252.py", line 23, in decode
          return codecs.charmap_decode(input,self.errors,decoding_table)[0]
      UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 18524: character maps to <undefined>
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
davidmartinrius commented 8 months ago

Maybe there is something wrong in the setup.py I will check it. I think it is working well the from requirements.txt

davidmartinrius commented 8 months ago

That's good news! What about the other packages, did you pip freeze to get the exact versions or are there some hard requirements?

Actually I don't know what is the best way to make a robust requirements.txt as this project uses several external packages. What I did by now is to clone some of the depending projects and customized the requirements.txt for each one.