Open Kreijstal opened 2 years ago
That sounds like a great idea. Although, it'll be really difficult because we can't ensure that the audio is split correctly, and time offsets have to be perfect. And it's not practical to fine-tune on a single sample. Do you have any approaches in mind?
Hmm maybe increase accessibility about how is the audio split? Also, how does the audio splitting work, is it also an AI?
I segment on the silent parts of the audio by adapting some code from this project. It's not an AI. We can fine-tune the params while splitting, but it's not a one-size-fits-all solution.
I am very happy to see your work. It really took a lot of effort. I don't know if you have any knowledge of NVIDIA NeMo. I found that NeMo's recognition efficiency is very high. I look forward to your time to make a version that uses NeMo as the recognition core. ^_^
Hi I will check it out. Do you know if the model outputs timing information for the detected speech segments? Because that's how I build the subtitle files. Do you know which performs better: HuggingFace Wav2Vec or NeMo?
Hi I will check it out. Do you know if the model outputs timing information for the detected speech segments? Because that's how I build the subtitle files. Do you know which performs better: HuggingFace Wav2Vec or NeMo?
In google you can test about the model outputs timing information for the detected speech segments
please use BRANCH = 'v1.0.2' to test
Since I really don't know HuggingFace Wav2Vec, don't know which is better
But in the NeMo example, I saw that the individual phonetic words are easily separated, and the code is also there. The specific file location is
NeMo/examples/asr/
NeMo/tutorials/asr/01_ASR_with_NeMo.ipynb
Offline_ASR.ipynb
Hi I will check it out. Do you know if the model outputs timing information for the detected speech segments? Because that's how I build the subtitle files. Do you know which performs better: HuggingFace Wav2Vec or NeMo?
In google you can test about the model outputs timing information for the detected speech segments please use BRANCH = 'v1.0.2' to test Since I really don't know HuggingFace Wav2Vec, don't know which is better But in the NeMo example, I saw that the individual phonetic words are easily separated, and the code is also there. The specific file location is NeMo/examples/asr/ NeMo/tutorials/asr/01_ASR_with_NeMo.ipynb Offline_ASR.ipynb
I want to try to process video files with a long duration, but it seems that the program can only process wav files within 15 seconds. I also want to translate it into several other languages. The same problem is also limited. I look forward to your presentation. open source programs for these functions
Because the voice translation service often translates some content incorrectly, or deliberately translates it incorrectly, which leads to deviations in the cognition of many people. This is a dark moment. I hope more people can get the truth.
It is really helpless to use offline speech recognition programs and translation programs. In the face of deliberate misleading and harm, we can only use software and platforms that are out of their control, Now I use Gettr
Thank you very much for providing an open source software that can fully implement from video and audio files to subtitle files.
I installed and used it, but I don't know which language should be recognized. I use English video files, but the effect seems to be bad. Can you tell me? Thank you so much
Thank you very much for providing an open source software that can fully implement from video and audio files to subtitle files.
I installed and used it, but I don't know which language should be recognized. I use English video files, but the effect seems to be bad. Can you tell me? Thank you so much
Or can you tell me how to change the translation module to adapt to other countries' languages, thanks a lot
I plan on implementing either one of Wav2Vec or NeMo, but will need some time. DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.
Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.
Thank you very much for your guidance and hope to see the new program written by you soon.
I plan on implementing either one of Wav2Vec or NeMo, but will need some time. DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.
Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.
Thank you very much for your guidance and hope to see the new program written by you soon.
I plan on implementing either one of Wav2Vec or NeMo, but will need some time. DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub.
Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.
According to your prompt, I downloaded the corresponding module IN I find deepspeech-0.9.3-models-zh-CN.pbmm deepspeech-0.9.3-models-zh-CN.scorer
I download it and I test to run ,But the following error occurs
If it's convenient, please test it out to see how you can get the module to work
Thank you so much
(sub) (base) gettr@gettr:~/AutoSub$ python3 autosub/main.py --model deepspeech-0.9.3-models-zh-CN.pbmm --scorer deepspeech-0.9.3-models-zh-CN.scorer --file ~/3-720.mp4 ARGS: Namespace(dry_run=False, file='/home/gettr/3-720.mp4', format=['srt', 'vtt', 'txt'], model='deepspeech-0.9.3-models-zh-CN.pbmm', scorer='deepspeech-0.9.3-models-zh-CN.scorer', split_duration=5) Model: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.pbmm Scorer: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.scorer Input file: /home/gettr/3-720.mp4 Extracted audio to audio/3-720.wav Splitting on silent parts in audio file
Running inference:
TensorFlow: v2.3.0-6-g23ad988
DeepSpeech: v0.9.3-0-gf2e9c85
0%| | 0/17 [00:07<?, ?it/s]
Traceback (most recent call last):
File "autosub/main.py", line 165, in
I plan on implementing either one of Wav2Vec or NeMo, but will need some time. DeepSpeech has official models for American English, and there are some community-made models here. If you find a model file you want to use, download the .pbmm and .scorer files and give them as input to AutoSub. Also, AutoSub can process large video files too. It automatically segments the audio into smaller chunks.
According to your prompt, I downloaded the corresponding module IN I find deepspeech-0.9.3-models-zh-CN.pbmm deepspeech-0.9.3-models-zh-CN.scorer
I download it and I test to run ,But the following error occurs
If it's convenient, please test it out to see how you can get the module to work
Thank you so much
(sub) (base) gettr@gettr:~/AutoSub$ python3 autosub/main.py --model deepspeech-0.9.3-models-zh-CN.pbmm --scorer deepspeech-0.9.3-models-zh-CN.scorer --file ~/3-720.mp4 ARGS: Namespace(dry_run=False, file='/home/gettr/3-720.mp4', format=['srt', 'vtt', 'txt'], model='deepspeech-0.9.3-models-zh-CN.pbmm', scorer='deepspeech-0.9.3-models-zh-CN.scorer', split_duration=5) Model: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.pbmm Scorer: /home/gettr/AutoSub/deepspeech-0.9.3-models-zh-CN.scorer Input file: /home/gettr/3-720.mp4 Extracted audio to audio/3-720.wav Splitting on silent parts in audio file
Running inference: TensorFlow: v2.3.0-6-g23ad988 DeepSpeech: v0.9.3-0-gf2e9c85 0%| | 0/17 [00:07<?, ?it/s] Traceback (most recent call last): File "autosub/main.py", line 165, in main() File "autosub/main.py", line 156, in main ds_process_audio(ds, audio_segment_path, output_file_handle_dict, split_duration=args.split_duration) File "autosub/main.py", line 66, in ds_process_audio write_to_file(output_file_handle_dict, split_inferred_text, line_count, split_limits, cues) File "/home/gettr/AutoSub/autosub/writeToFile.py", line 43, in write_to_file file_handle.write(inferred_text + "\n\n") UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-88: surrogates not allowed
———————————————————————————————————————————— I found an instruction, but I don't know how to do it LINK
Given a false sub, would it be possible to give a correct sub, and retrain on it?