KakaruHayate / ColorSplitter

A cli tool for split vocal timbre.
MIT License
165 stars 10 forks source link

Suggestion for future release #2

Open ariikamusic opened 8 months ago

ariikamusic commented 8 months ago

I've had positive experiences using ColorSplitter for .wav files and creating datasets for DiffSinger. It excels in recognizing vocal timbres, especially in audio files ranging from 5 to 15 seconds. While not necessarily an issue, I would like to make a suggestion to hopefully improve results and enhance user experience.

Consider allowing ColorSplitter to also handle the cutting process instead of solely organizing pre-cut audio into folders. This way, it could identify different vocal timbres within a single file and also handle the organization of this. In situations where different vocal timbres are present into one file, ColorSplitter currently ignores this and only does the organisation based on the audios presented.

I hope this suggestion aligns with your goals for the project. Looking forward to future updates, and thank you for your ongoing efforts!

KakaruHayate commented 8 months ago

Thank you for your issue:)

At present, in audio processing, the actual method is to extract voiceprint features from the mel spectrum of 80 frames (800ms) each time, and then average them in the time domain to obtain the average features for each clip. So the resolution for recognition may be sufficient.

In another application of this technology, it can be combined with VAD models to locate the timestamps of different speakers in a segment of audio, thus meeting your needs

However, as of now, the resolution of the model is still very weak, and it may be too early to make attempts in this regard

As far as training this model is concerned, my data is also too limited. The judgment of human voice timbre is very subjective, which also leads to limited research in this area

In addition, I have only now understood the correct training method, and the new weights have been uploaded. Please update them