fictions-ai / autocaption

Add caption to any video
171 stars 36 forks source link

7 min video is takin 4 hours to complete on 13900 K CPU and RTX 3090 #4

Open FurkanGozukara opened 9 months ago

FurkanGozukara commented 9 months ago

I think something is very wrong

video attributes

General
Complete name                  : C:\temp\a.mp4
Format                         : MPEG-4
Format profile                 : Base Media
Codec ID                       : isom (isom/iso4)
File size                      : 20.4 MiB
Duration                       : 7 min 51 s
Overall bit rate mode          : Variable
Overall bit rate               : 362 kb/s
Encoded date                   : UTC 2024-01-14 01:25:10
Tagged date                    : UTC 2024-01-14 01:25:10

Video
ID                             : 1
Format                         : AVC
Format/Info                    : Advanced Video Codec
Format profile                 : High@L3.1
Format settings                : CABAC / 5 Ref Frames
Format settings, CABAC         : Yes
Format settings, Reference fra : 5 frames
Codec ID                       : avc1
Codec ID/Info                  : Advanced Video Coding
Duration                       : 7 min 51 s
Bit rate                       : 227 kb/s
Maximum bit rate               : 470 kb/s
Width                          : 576 pixels
Height                         : 1 024 pixels
Display aspect ratio           : 0.563
Frame rate mode                : Constant
Frame rate                     : 30.000 FPS
Color space                    : YUV
Chroma subsampling             : 4:2:0
Bit depth                      : 8 bits
Scan type                      : Progressive
Bits/(Pixel*Frame)             : 0.013
Stream size                    : 12.7 MiB (63%)
Title                          : Twitter-vork muxer
Writing library                : x264 core 164 r3095 baee400
Encoding settings              : cabac=1 / ref=5 / deblock=1:0:0 / analyse=0x3:0x113 / me=hex / subme=2 / psy=0 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=1 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=0 / threads=4 / lookahead_threads=1 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / stitchable=1 / constrained_intra=0 / bframes=3 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=infinite / keyint_min=30 / scenecut=40 / intra_refresh=0 / rc_lookahead=40 / rc=crf / mbtree=1 / crf=28.0 / qcomp=0.60 / qpmin=10 / qpmax=69 / qpstep=4 / vbv_maxrate=2048 / vbv_bufsize=2048 / crf_max=0.0 / nal_hrd=none / filler=0 / ip_ratio=1.40 / aq=2:1.00
Tagged date                    : UTC 2024-01-14 01:25:10
Codec configuration box        : avcC

Audio
ID                             : 2
Format                         : AAC LC
Format/Info                    : Advanced Audio Codec Low Complexity
Codec ID                       : mp4a-40-2
Duration                       : 7 min 51 s
Bit rate mode                  : Variable
Bit rate                       : 128 kb/s
Maximum bit rate               : 137 kb/s
Channel(s)                     : 2 channels
Channel layout                 : L R
Sampling rate                  : 44.1 kHz
Frame rate                     : 43.066 FPS (1024 SPF)
Compression mode               : Lossy
Stream size                    : 7.19 MiB (35%)
Title                          : Twitter-vork muxer
Default                        : Yes
Alternate group                : 1
Tagged date                    : UTC 2024-01-14 01:25:10

image

And here the venv attributes

Microsoft Windows [Version 10.0.19045.3930]
(c) Microsoft Corporation. All rights reserved.

C:\temp\caption\autocaption\venv\Scripts>activate

(venv) C:\temp\caption\autocaption\venv\Scripts>pip freeze
altair==5.2.0
annotated-types==0.6.0
attrs==23.2.0
av==10.0.0
beautifulsoup4==4.12.2
blinker==1.7.0
blis==0.7.11
cachetools==5.3.2
catalogue==2.0.10
certifi==2023.11.17
chardet==3.0.4
charset-normalizer==3.3.2
click==8.1.7
cloudpathlib==0.16.0
colorama==0.4.6
coloredlogs==15.0.1
confection==0.1.4
contourpy==1.2.0
ctranslate2==3.24.0
cycler==0.12.1
cymem==2.0.8
decorator==4.4.2
Faker==22.2.0
faster-whisper==0.7.0
favicon==0.7.0
ffmpeg==1.4
ffmpeg-python==0.2.0
filelock==3.13.1
flatbuffers==23.5.26
fonttools==4.47.2
fsspec==2023.12.2
future==0.18.3
gitdb==4.0.11
GitPython==3.1.41
googletrans==3.1.0a0
h11==0.9.0
h2==3.2.0
hpack==3.0.0
hstspreload==2024.1.5
htbuilder==0.6.1
httpcore==0.9.1
httpx==0.13.3
huggingface-hub==0.20.2
humanfriendly==10.0
hyperframe==5.2.0
idna==2.10
imageio==2.33.1
imageio-ffmpeg==0.4.9
importlib-metadata==6.11.0
Jinja2==3.1.3
joblib==1.3.2
jsonschema==4.20.0
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
langcodes==3.3.0
llvmlite==0.41.1
lxml==5.1.0
Markdown==3.5.2
markdown-it-py==3.0.0
markdownlit==0.0.7
MarkupSafe==2.1.3
matplotlib==3.8.2
mdurl==0.1.2
more-itertools==10.2.0
moviepy==2.0.0.dev2
mpmath==1.3.0
murmurhash==1.0.10
networkx==3.2.1
nltk==3.8.1
numba==0.58.1
numpy==1.26.3
onnxruntime==1.16.3
openai-whisper==20230314
packaging==23.2
pandas==2.0.3
Pillow==9.5.0
preshed==3.0.9
proglog==0.1.10
protobuf==4.25.2
pyarrow==14.0.2
pydantic==2.5.3
pydantic_core==2.14.6
pydeck==0.8.0
pydub==0.25.1
Pygments==2.17.2
pymdown-extensions==10.7
Pympler==1.0.1
pyparsing==3.1.1
pyreadline3==3.4.1
python-dateutil==2.8.2
pytz==2023.3.post1
pytz-deprecation-shim==0.1.0.post0
PyYAML==6.0.1
referencing==0.32.1
regex==2023.12.25
requests==2.31.0
rfc3986==1.5.0
rich==13.7.0
rpds-py==0.17.1
six==1.16.0
smart-open==6.4.0
smmap==5.0.1
sniffio==1.3.0
soupsieve==2.5
spacy==3.7.2
spacy-legacy==3.0.12
spacy-loggers==1.0.5
spacytextblob==4.0.0
SpeechRecognition==3.10.0
srsly==2.4.8
st-annotated-text==4.0.1
streamlit==1.25.0
streamlit-camera-input-live==0.2.0
streamlit-card==1.0.0
streamlit-embedcode==0.1.2
streamlit-extras==0.3.0
streamlit-faker==0.0.3
streamlit-image-coordinates==0.1.6
streamlit-keyup==0.2.2
streamlit-toggle-switch==1.0.2
streamlit-vertical-slider==2.5.5
sympy==1.12
tenacity==8.2.3
textblob==0.15.3
thinc==8.2.2
tiktoken==0.3.1
tokenizers==0.13.3
toml==0.10.2
toolz==0.12.0
torch==2.1.2
tornado==6.4
tqdm==4.66.1
typer==0.9.0
typing_extensions==4.9.0
tzdata==2023.4
tzlocal==4.3.1
urllib3==2.1.0
validators==0.22.0
wasabi==1.1.2
watchdog==3.0.0
weasel==0.3.4
zipp==3.17.0

(venv) C:\temp\caption\autocaption\venv\Scripts>
thibaudart commented 9 months ago

it seems you're not using GPU but CPU.

FurkanGozukara commented 9 months ago

it seems you're not using GPU but CPU.

I manually installed Pytorch CUDA version too. Didn't make change.