BingLingGroup / autosub

Command-line utility to transcribe/translate from video/audio/subtitles to subtitles
GNU General Public License v2.0
1.98k stars 244 forks source link

Error running Autosub, Need Help setting up autosub and Run #113

Closed rajeshmani28 closed 4 years ago

rajeshmani28 commented 4 years ago

Hi,

I downloaded the autosub-0.5.6-alpha-win-x64-nuitka package for windows and installed on Windows 10. I installed python 2.7.1.2

then i modified the run.bat file to create a japanese subtitle for a japanese video and ran on CMD window and got the below error:

D:\subs\autosub\autosub>.\autosub -SRC ja-jp -i "D:\subs\test.mp4" -D ja
Speech language not provided. Only performing speech regions detection.

Convert source file to "C:\Users\ELCOT\AppData\Local\Temp\tmpqoi0sit3.wav" to detect audio regions.
D:\subs\autosub\autosub -hide_banner -y -i "D:\subs\test.mp4" -vn -ac 1 -ar 48000 "C:\Users\ELCOT\AppData\Local\Temp\tmpqoi0sit3.wav"
Traceback (most recent call last):
  File "D:\subs\autosub\autosub\__main__.py", line 26, in <module>
  File "D:\subs\autosub\autosub\autosub\__init__.py", line 158, in main
  File "D:\subs\autosub\autosub\autosub\cmdline_utils.py", line 945, in audio_or_video_prcs
  File "D:\subs\autosub\autosub\subprocess.py", line 316, in check_output
  File "D:\subs\autosub\autosub\subprocess.py", line 383, in run
  File "D:\subs\autosub\autosub\subprocess.py", line 676, in __init__
  File "D:\subs\autosub\autosub\subprocess.py", line 957, in _execute_child
PermissionError: [WinError 5] Access is denied

How do i solve this issue. Am i missing anything in the installtion?? Also how do i run each steps in the workflow to create Japanese subtitle, should i run them individually or will autosub run them all automatically and create subtitles?

PLease can anyone help me setup this program and generate japanese subtitle for japanese video.

Thanks RM

BingLingGroup commented 4 years ago

First, you are using the wrong command. If the input is a video or audio file, it should be like this.

https://github.com/BingLingGroup/autosub#google-speech-v2

autosub -i "D:\subs\test.mp4" -S ja-jp
BingLingGroup commented 4 years ago

I notice something.

D:\subs\autosub\autosub -hide_banner -y -i "D:\subs\test.mp4" -vn -ac 1 -ar 48000 "C:\Users\ELCOT\AppData\Local\Temp\tmpqoi0sit3.wav"

Are you setting the value of the environment variable "FFMPEG_PATH" to the path of the ffmpeg directory but not the executable itself?

BingLingGroup commented 4 years ago

This should work. autosub.zip

rajeshmani28 commented 4 years ago

Hi,

Thanks for your reply and the file, it worked now.

But i am seeing lots of subtitle for dialogues not getting generated, may be because of background music or noise. is there anyway i can tune to get the best result.

Is there anyway i can strip just all the dialogues from background music and noise and then feed this to generate the subtitles?

Once again thank you for your program and help here.

-RM

BingLingGroup commented 4 years ago

Yes, you can do it by using -ap y option which will use the audio pre-processing commands mentioned here.

Of course you can still tweak the auditok options. Use Ctrl-F to search auditok in readme options.

And another way to improve the result is you can just manually edit the times before sending it to the API. You can use autosub -i xxx to only get the times. Then manually edit it and use opition -er to input the times subtitles you edited, which I mentioned in split-audio.

rajeshmani28 commented 4 years ago

Hi Bing,

I tried the -ap option and got the below error:

D:\subs\autosub\autosub>autosub -i "d:\subs\test.mp4" -ap y -S ja-jp D:\subs\autosub\autosub\ffmpeg.exe -hide_banner -i "d:\subs\test.mp4" -vn -af "asplit[a],aphasemeter=video=0,ametadata=select:key=lavfi.aphasemeter.phase:value=-0.005:function=less,pan=1c|c0=c0,aresample=async=1:first_pts=0,[a]amix" -ac 1 -f flac -loglevel error "C:\Users\ELCOT\AppData\Local\Temp\tmp_koev8ch.flac" Error while filtering: Invalid argument Failed to inject frame into filter network: Invalid argument Error while processing the decoded data for stream #0:1 Traceback (most recent call last): File "autosub__main.py", line 25, in File "autosub__init__.py", line 119, in main File "autosub\ffmpeg_utils.py", line 232, in audio_pre_prcs File "subprocess.py", line 395, in check_output File "subprocess.py", line 487, in run subprocess.CalledProcessError: Command 'D:\subs\autosub\autosub\ffmpeg.exe -hide_banner -i "d:\subs\test.mp4" -vn -af "asplit[a],aphasemeter=video=0,ametadata=select:key=lavfi.aphasemeter.phase:value=-0.005:function=less,pan=1c|c0=c0,aresample=async=1:first_pts=0,[a]amix" -ac 1 -f flac -loglevel error "C:\Users\ELCOT\AppData\Local\Temp\tmp_koev8ch.flac"' returned non-zero exit status 1. [58448] Failed to execute script main__

btw, I don't know japanese language, so i guess i wouldn't manually edit the times to send it to API. can you also give me a sample command to get a better result using auditok,

thanks RM

BingLingGroup commented 4 years ago

The command which went wrong in your circumstance is just the one recommended by the ffmpeg document AudioChannelManipulation. You should consider modifying the pre-processing command by using the following option. (In cmd not Powershell)

-apc "D:\subs\autosub\autosub\ffmpeg.exe -hide_banner -i \"{in_}\" -vn -ac 1 -loglevel error \"{out_}\"" "D:\subs\autosub\autosub\ffmpeg.exe -hide_banner -i \"{in_}\" -af \"lowpass=3000,highpass=200\" -loglevel error \"{out_}\"" "D:\subs\autosub\autosub\ffmpeg-normalize.exe -v \"{in_}\" -ar 44100 -ofmt flac -c:a flac -pr -p -o \"{out_}\""

I don't have your audio file so I don't know which solution is better for your issue. Audio pre-processing is usually useful. About how to use or modify auditok options, please just read the help message carefully, especially the auditok documents.

rajeshmani28 commented 4 years ago

HI,

I was under assumption that the audio pre-processing is something that gets called automatically when we use option -ap

Anyways, i tried the command and did not get any difference.

Thank you for your help.

-RM

BingLingGroup commented 4 years ago

You can run these commands outside the autosub to test if it can work.

And please attach the logs every time you try the new arguments so I can know which the error is.

Anyway it seems it's not caused by the autosub. It seems that you audio can't be processed by ffmpeg and this command. You can always use other audio processing software before autosub input it, such as Adobe Audition.

I suspect that the audio is already mono so it can't convert a mono audio into a mono one. Please check your input audio.

rajeshmani28 commented 4 years ago

Thanks for your reply.

I am actually processing a HD quality video, so I guess my audio is not mono. Can you tell me where i can get the logs, and also tell me how to analyse my input audio and send you my input audio details?

Ideally my workflow should be like this or something that i am attempting to:

  1. Input the HD Video as MP4 --- > output the a audio alone as MP3
  2. Input the MP3 --> output the flac file as expected by autosub
  3. rest of the process remains the same

Thanks RM

rajeshmani28 commented 4 years ago

here is my mp4 file property

Metadata: major_brand : isom minor_version : 512 compatible_brands: isomiso2avc1mp41 encoder : Lavf58.35.101 Duration: 01:58:06.07, start: 0.000000, bitrate: 6167 kb/s Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1920x1080 [SAR 1:1 DAR 16:9], 5964 kb/s, 29.97 fps, 29.97 tbr, 90k tbn, 59.94 tbc (default) Metadata: handler_name : VideoHandler Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 192 kb/s (default)

hope this will help you analyse what iam trying to do, please tell me which tool i need to use to pre-process the audio so i can feed that into autosub

Thanks RM

BingLingGroup commented 4 years ago

Log is the output of the cmd or the terminal you use.

Try this one and attach the log.

-apc "D:\subs\autosub\autosub\ffmpeg.exe -hide_banner -i \"{in_}\" -vn -ac 1 -loglevel error \"{out_}\"" "D:\subs\autosub\autosub\ffmpeg.exe -hide_banner -i \"{in_}\" -af \"lowpass=3000,highpass=200\" -loglevel error \"{out_}\"" "D:\subs\autosub\autosub\ffmpeg-normalize.exe -v \"{in_}\" -ar 44100 -ofmt flac -c:a flac -pr -p -o \"{out_}\""

If not worked, try this one and attach the log.

-apc "D:\subs\autosub\autosub\ffmpeg.exe -hide_banner -i \"{in_}\" -vn -ac 1 -loglevel error \"{out_}\"" "D:\subs\autosub\autosub\ffmpeg-normalize.exe -v \"{in_}\" -ar 44100 -ofmt flac -c:a flac -pr -p -o \"{out_}\""
rajeshmani28 commented 4 years ago

Hi Bing,

Thank you so much for your support and help here, The first command ran successfully, below is the CMD output log

D:\subs\autosub\autosub>autosub -i "d:\test\test.mp4" -apc "D:\subs\autosub\autosub\ffmpeg.exe -hide_banner -i \"{in_}\" -vn -ac 1 -loglevel error \"{out_}\"" "D:\subs\autosub\autosub\ffmpeg.exe -hide_banner -i \"{in_}\" -af \"lowpass=3000,highpass=200\" -loglevel error \"{out_}\"" "D:\subs\autosub\autosub\ffmpeg-normalize.exe -v \"{in_}\" -ar 44100 -ofmt flac -c:a flac -pr -p -o \"{out_}\""
Speech language not provided. Only performing speech regions detection.

Convert source file to "C:\Users\ELCOT\AppData\Local\Temp\tmpwk9m2uc0.wav" to detect audio regions.
D:\subs\autosub\autosub\ffmpeg.exe -hide_banner -y -i "d:\test\test.mp4" -vn -ac 1 -ar 48000 -loglevel error "C:\Users\ELCOT\AppData\Local\Temp\tmpwk9m2uc0.wav"

Use ffprobe to check conversion result.
D:\subs\autosub\autosub\ffprobe.exe "C:\Users\ELCOT\AppData\Local\Temp\tmpwk9m2uc0.wav" -show_format -pretty -loglevel quiet
[FORMAT]
filename=C:\Users\ELCOT\AppData\Local\Temp\tmpwk9m2uc0.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=1:58:05.930667
size=648.736403 Mibyte
bit_rate=768 Kbit/s
probe_score=99
TAG:encoder=Lavf58.29.100
[/FORMAT]

Conversion completed.
Use Auditok to detect speech regions.
Auditok detection completed.
"C:\Users\ELCOT\AppData\Local\Temp\tmpwk9m2uc0.wav" has been deleted.
Override "-of"/"--output-files" due to your args too few.
Output regions subtitles file only.
Times file created at "d:\test\test.times.srt".

All works done.

I got a srt file with just times

Thanks RM

BingLingGroup commented 4 years ago

Can you access to google server? If you need to use proxy server to access to google, please use option -hsp or option -hp.

Or you can use -k option to keep all the audio files generated during the processing, and try to listen to them to check if it is loud and clear.

rajeshmani28 commented 4 years ago

can i upload a 2:30 (two minutes thirty seconds) video sample here, I tried the sample but most dialogue are missing the transcription. may be you can try, please let me know.

Thanks RM

BingLingGroup commented 4 years ago

Ok. You can try it.

rajeshmani28 commented 4 years ago

smallfile.zip

Hi Bing,

I have uploaded the small mp3 file that i extracted from the video file,

I have an idea, is there anyway we can display the frequency of the audio (only for all the vocal) along with the subtitle in the srt file? this way we know if there is frequency logged along the the subtitle we know for which frequency the subtitle is working and for which frequency the subtitle is not getting generated.

An example of what i am looking for is:

6 00:00:27,540 --> 00:00:28,670 印紙税 frequency is

17 00:00:38,760 --> 00:00:39,330 Frequency is

so above, in the second case we know for which frequency the subtitle is not getting generated. we convert all such frequency to some value for which the subtitle is getting generated. this way we get all most all subtitle

How do i modify the program to get such output ? please help me

Thanks RM

Yuelioi commented 4 years ago

使用的代码: autosub -i "C:\Users\yl\Desktop\1.mp4" -S en-us Python版本: Python 3.7 (虚拟环境G:\back\pyfile中) 版本: autosub-0.5.6-alpha @BingLingGroup 我太难力 大大有没有qq群之类的...

(ENV) G:\back\pyfile\ENV\Scripts>autosub -i "C:\Users\yl\Desktop\1.mp4" -S en-us
Translation destination language not provided. Only performing speech recognition.
Speech language is the same as the destination language. Only performing speech recognition.

Convert source file to "C:\Users\yl\AppData\Local\Temp\tmpxv6pfjp3.wav" to detect audio regions.
C:\ffmpeg\bin -hide_banner -y -i "C:\Users\yl\Desktop\1.mp4" -vn -ac 1 -ar 48000 "C:\Users\yl\AppData\Local\Temp\tmpxv6pfjp3.wav"
Traceback (most recent call last):
  File "G:\back\pyfile\ENV\Scripts\autosub-script.py", line 11, in <module>
    load_entry_point('autosub==0.5.6a0', 'console_scripts', 'autosub')()
  File "G:\back\pyfile\ENV\lib\site-packages\autosub-0.5.6a0-py3.7.egg\autosub\__init__.py", line 161, in main
    styles_list=styles_list)
  File "G:\back\pyfile\ENV\lib\site-packages\autosub-0.5.6a0-py3.7.egg\autosub\cmdline_utils.py", line 947, in audio_or_video_prcs
    stdin=open(os.devnull))
  File "C:\Python37\lib\subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "C:\Python37\lib\subprocess.py", line 488, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Python37\lib\subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "C:\Python37\lib\subprocess.py", line 1207, in _execute_child
    startupinfo)
PermissionError: [WinError 5] 拒绝访问。

同版本非虚拟环境下

C:\Users\yl>autosub -i "C:\Users\yl\Desktop\1.mp4" -S en-us
翻译目的语言未提供。只进行语音识别。
语音语言和目的语言一致。只进行语音识别。

将源文件转换为"C:\Users\yl\AppData\Local\Temp\tmpp0hlctdf.wav"来检测语音区域。
C:\ffmpeg\bin\ffmpeg.exe -hide_banner -y -i "C:\Users\yl\Desktop\1.mp4" -vn -ac 1 -ar 48000 -loglevel error "C:\Users\yl\AppData\Local\Temp\tmpp0hlctdf.wav"

使用ffprobe来检查转换结果。
C:\ffmpeg\bin\ffprobe.exe "C:\Users\yl\AppData\Local\Temp\tmpp0hlctdf.wav" -show_format -pretty -loglevel quiet
[FORMAT]
filename=C:\Users\yl\AppData\Local\Temp\tmpp0hlctdf.wav
nb_streams=1
nb_programs=0
format_name=wav
format_long_name=WAV / WAVE (Waveform Audio)
start_time=N/A
duration=0:03:27.516750
size=18.998800 Mibyte
bit_rate=768.003000 Kbit/s
probe_score=99
TAG:encoder=Lavf58.42.100
[/FORMAT]

转换完毕。
使用Auditok检测语音区域。
Traceback (most recent call last):
  File "c:\python37\lib\site-packages\auditok-0.2.0a0-py3.7.egg\auditok\util.py", line 1007, in __getattr__
    return getattr(self._audio_source, name)
  File "c:\python37\lib\site-packages\auditok-0.2.0a0-py3.7.egg\auditok\util.py", line 856, in __getattr__
    return getattr(self._audio_source, name)
  File "c:\python37\lib\site-packages\auditok-0.2.0a0-py3.7.egg\auditok\util.py", line 736, in __getattr__
    return getattr(self._audio_source, name)
AttributeError: 'BufferAudioSource' object has no attribute 'get_sample_width'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python37\Scripts\autosub-script.py", line 11, in <module>
    load_entry_point('autosub==0.5.6a0', 'console_scripts', 'autosub')()
  File "c:\python37\lib\site-packages\autosub\__init__.py", line 159, in main
    styles_list=styles_list)
  File "c:\python37\lib\site-packages\autosub\cmdline_utils.py", line 1116, in audio_or_video_prcs
    mode=mode)
  File "c:\python37\lib\site-packages\autosub\core.py", line 51, in auditok_gen_speech_regions
    sample_width=asource.get_sample_width(),
  File "c:\python37\lib\site-packages\auditok-0.2.0a0-py3.7.egg\auditok\util.py", line 1010, in __getattr__
    "'AudioReader' has no attribute '{}'".format(name)
AttributeError: 'AudioReader' has no attribute 'get_sample_width'
BingLingGroup commented 4 years ago

@rajeshmani28 I don't know what's the meaning of frequency. Honestly speaking, the whole speech-to-text procedure is just simple. I send audio fragments to API and it sends back to me the result. That procedure has nothing to do with audio frequency. The API is just like the black box to me. I don't know what kind of frequency it's working on.

BingLingGroup commented 4 years ago

result.zip Here is the result subtitles and the preprocessed audio.

  1. It has the result, especially after using the preprocessed audio.
  2. I think the main issue is your Internet access. Perhaps you can't access to Google Server, or the Internet is not that stable.
  3. If you want to reduce the empty regions especially the ones are invalid, please use the auditok options. For example -mxcs 0.1 -mxrs 10 -et 55 can reduce some of them. Or just use the option -der.
BingLingGroup commented 4 years ago

Since it's not a problem with my code, I will close this issue.