Regarding ASR testing - Githubissues

OpenMOSS / AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

783 stars 63 forks source link

Hello, I think it's probably not an issue with the prompt, each prompt has been seen many times during training. I would like to confirm two things: First, are you using beam search as your decoding strategy? This strategy generally produces the best results. Second, it's necessary to perform some post-processing on the transcription results to standardize them, because the output format of the LLM is very different from the ground truth, including punctuation and words like "you're" which shoud be "you are" in the groundtruth. I also use jiwer for caculating wer. Regarding the test code, unfortunately, it was lost during an environment migration, but I believe if you use GPT to write some standardization code, you should be able to achieve the results mentioned in the paper.(I didn't handle all the standardization cases)

OpenMOSS / AnyGPT

Regarding ASR testing #40