Open francqz31 opened 1 year ago
Hi francqz31,
Thank you for your feedback!
This tool will give you the frame-based phoneme sequence with a frequency of 50 frames per second. You can therefore easily get the phoneme duration sequence from it. It will output a sequence like 'a a a a a a a a a a b b b', which you can convert to [(a, 10/50), (b, 3/50)] to have the phoneme duration sequence.
In terms of performance, I reach a performance of around 15% of Phoneme Error Rate (PER) (so basically around 15% of the frames have a wrong phoneme predicted) for the model without CTC loss and 17% of PER for the model combining CTC and cross-entropy loss for the learning. I am trying to find ways to improve this accuracy.
By the way, the repository on GitHub is still in construction, I am currently cleaning my code locally to make it available. There might therefore still be some inconsistencies.
Are you working on Singing Voice Synthesis too ? I am curious to know how this tool will be used.
Regards,
Le mer. 15 nov. 2023 à 17:57, francqz31 @.***> a écrit :
Hello Amaus-p , I hope you are doing well !
I wanted to ask if this tool "PhoneHuBert" can be used to get the phoneme sequence And phoneme duration sequence ? Both.
And how accurate are the results ? is it as accurate as the human labeling of opencpop.
Thanks in advance , and thanks for this amazing tool.
— Reply to this email directly, view it on GitHub https://github.com/Amaus-p/PhonHuBERT/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOU6GCFWAOVPU7IONS4UNCTYESG2HAVCNFSM6AAAAAA7MGX4ZCVHI2DSMVQWIX3LMV43ASLTON2WKOZRHE4TINBTGU4TCMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hello @Amaus-p
Yeah I still didn't use "PhoneHuBert" As I thought too that I'm too early and that the repo is still under construction.
So I will wait and see until you think it is done with improvements. :)
As for your question about SVS , Yes I'm working on SVS algorithms , and I wanted to format my English dataset just Like opencpop. so I was waiting for an automatic way to get the , phoneme sequence| phoneme duration sequence
and also midi (note) sequence| midi duration sequence.
. and I found this repo , so yeah I will be using it to get the
phoneme sequence| phoneme duration sequence
for my English singing dataset.
but it has to be as accurate as possible so it doesn't break the SVS algorithm, Wow 15% (PER) isn't bad , It is actually a really good start , if we can make it less than 5% it would be wonderful. if you want help in training or data, I have Rtx 4090 i can train anything on.
Thanks in advance!
Hey, thank you for your fast response.
Nice to find other people working on the same topic!
Nice project! Do you record your songs yourself ? My goal in the end is to record and annotate a French dataset.
For your midi sequence issue, I worked also on that. I will upload the code soon on the following repository https://github.com/Amaus-p/ANST_models. Basically, I'm finishing a master's thesis where my goal was to develop an automatic annotation tool for SVS dataset. I'm currently cleaning everything to make it available before starting further improvements.
Thank you for the proposition for the training, I won't hesitate to ask.
Regards
Le jeu. 16 nov. 2023 à 23:51, francqz31 @.***> a écrit :
Hello @Amaus-p https://github.com/Amaus-p
Yeah I still didn't use "PhoneHuBert" As I thought too that I'm too early and that the repo is still under construction.
So I will wait and see until you think it is done with improvements. :)
As for your question about SVS , Yes I'm working on SVS algorithms , and I wanted to format my English dataset just Like opencpop. so I was waiting for an automatic way to get the , phoneme sequence| phoneme duration sequence and also midi (note) sequence| midi duration sequence. . and I found this repo , so yeah I will be using it to get the phoneme sequence| phoneme duration sequence for my English singing dataset.
but it has to be as accurate as possible so it doesn't break the SVS algorithm, Wow 15% (PER) isn't bad , It is actually a really good start , if we can make it less than 5% it would be wonderful. if you want help in training or data, I have Rtx 4090 i can train anything on.
Thanks in advance!
— Reply to this email directly, view it on GitHub https://github.com/Amaus-p/PhonHuBERT/issues/1#issuecomment-1814733021, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOU6GCAUPDQSUEIATGGFPM3YEYZBTAVCNFSM6AAAAAA7MGX4ZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJUG4ZTGMBSGE . You are receiving this because you were mentioned.Message ID: @.***>
@Amaus-p Exactly it is really nice to see researchers sharing the same interest especially that it is a rare topic. :)
Yes I noticed ANST , I guessed that it stands for Automatic note singing transcription. I hope that it is as amazing as PhonHubert.
Also yes I record songs myself and I also have artist friends that record too and send me.
Really ? so you are telling me in "ANST" we will be able to get Both midi (note) sequence| midi duration sequence
wow that would be really awesome. I hope that you successfully finish your master's soon :)
Nice! ANST stands for Aligned Note Sequence Transcription, but you were close :) Yes, basically, I am generating a midi file containing the midi pitch, the onset (starting time) and offset (ending time) of the note.
Happy new year! any new news? do you want help in training ?
Thank you for reaching out, happy new year too! Thanks again for offering your resources! For now, I have the resources for training, but I might need them in the future. I will start to work on a new model, I will let you know when I have further advances. Regards
@Amaus-p Really happy to see you back, Finally! I actually started to worry lmao. was just thinking about this repo first february. Yeah I'm really struggling on annotating my singing dataset that i had to give up. I will try your repo asap and see how accurate it is on english singing
how about the ANST ? are you shipping it soon ? please don't go out for soo long again :)
For the ANST, I will first work on a paper on this topic before releasing the code. I will let you know when I release everything.
On Fri, Feb 16, 2024, 01:15 francqz31 @.***> wrote:
how about the ANST ? are you shipping it soon ? please don't go out for soo long again :)
— Reply to this email directly, view it on GitHub https://github.com/Amaus-p/PhonHuBERT/issues/1#issuecomment-1946837329, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOU6GCCDK4QBRJARENXLW3DYTZGFJAVCNFSM6AAAAAA7MGX4ZCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBWHAZTOMZSHE . You are receiving this because you were mentioned.Message ID: @.***>
@Amaus-p oh ok wonderful i actually don't know if you saw these 2 repos ?? they can be really really useful . https://github.com/qiuqiao/SOFA and this looks like PhonHubert https://github.com/openvpi/SOME ? this should be for the midi stuff like the ANST . i will start training phonhubert for english and let you know
Hello Amaus-p , I hope you are doing well !
I wanted to ask if this tool "PhoneHuBert" can be used to get the phoneme sequence And phoneme duration sequence ? Both.
And how accurate are the results ? is it as accurate as the human labeling of opencpop.
Thanks in advance , and thanks for this amazing tool.