Fictionarry / ER-NeRF

[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
https://fictionarry.github.io/ER-NeRF/
MIT License
894 stars 124 forks source link

My inference FPS is so slow #44

Open mjbooo opened 9 months ago

mjbooo commented 9 months ago

Hello! I want to express my appreciation for your excellent work. I have a question regarding inference speed.

I recently conducted a test using a 14-second-long audio clip (equivalent to 351 frames) with the Obama video you provided. However, the inference process took approximately 2 minutes, which translates to around 3 frames per second (FPS).

I'm using an A100 GPU, and I've included a list of the installed packages below. But someone mentioned that they were able to achieve an inference speed of 17 FPS using just an RTX 3090.

Furthermore, I followed your instructions to install the packages, | but I encountered an issue with the gridencoder. So I addressed this separately by using the following command to enable support for the sm80 CUDA architecture.

TORCH_CUDA_ARCH_LIST=8.0 pip install ./gridencoder

Do you have any suggestions or insights on how to improve the inference speed?

absl-py==1.4.0
asttokens==2.4.0
astunparse==1.6.3
backcall==0.2.0
Brotli @ file:///home/conda/feedstock_root/build_artifacts/brotli-split_1693583441880/work
cachetools==5.3.1
certifi==2023.7.22
cffi==1.15.1
charset-normalizer @ file:///home/conda/feedstock_root/build_artifacts/charset-normalizer_1688813409104/work
comm==0.1.4
ConfigArgParse==1.7
contourpy==1.1.1
cycler==0.11.0
dearpygui==1.10.0
debugpy==1.8.0
decorator==5.1.1
einops==0.6.1
exceptiongroup==1.1.3
executing==1.2.0
face-alignment==1.4.1
flatbuffers==23.5.26
fonttools==4.42.1
fvcore==0.1.5.post20221221
gast==0.5.4
google-auth==2.23.0
google-auth-oauthlib==0.4.6
google-pasta==0.2.0
gridencoder @ file:///workspace/ER-NeRF/gridencoder
grpcio==1.58.0
h5py==3.9.0
idna @ file:///home/conda/feedstock_root/build_artifacts/idna_1663625384323/work
imageio==2.31.3
imageio-ffmpeg==0.4.9
iopath==0.1.10
ipykernel==6.25.2
ipython==8.15.0
jedi==0.19.0
joblib==1.3.2
jupyter_client==8.3.1
jupyter_core==5.3.1
keras==2.8.0
Keras-Preprocessing==1.1.2
kiwisolver==1.4.5
lazy_loader==0.3
libclang==16.0.6
llvmlite==0.40.1
lpips==0.1.4
Markdown==3.4.4
markdown-it-py==3.0.0
MarkupSafe==2.1.3
matplotlib==3.8.0
matplotlib-inline==0.1.6
mdurl==0.1.2
nest-asyncio==1.5.7
networkx==3.1
ninja==1.11.1
numba==0.57.1
numpy==1.24.4
oauthlib==3.2.2
objprint==0.2.2
opencv-python==4.8.0.76
opt-einsum==3.3.0
packaging==23.1
pandas==2.1.0
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
Pillow @ file:///home/conda/feedstock_root/build_artifacts/pillow_1675487172403/work
platformdirs==3.10.0
portalocker==2.8.2
prompt-toolkit==3.0.39
protobuf==3.20.3
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyasn1==0.5.0
pyasn1-modules==0.3.0
PyAudio==0.2.13
pycparser==2.21
Pygments==2.16.1
PyMCubes==0.1.4
pyparsing==3.1.1
PySocks @ file:///home/conda/feedstock_root/build_artifacts/pysocks_1661604839144/work
python-dateutil==2.8.2
python-speech-features==0.6
pytorch3d @ git+https://github.com/facebookresearch/pytorch3d.git@6f2212da46f3ad1a596b3e1017be2d16eaaf95f9
pytz==2023.3.post1
PyWavelets==1.4.1
PyYAML==6.0.1
pyzmq==25.1.1
requests @ file:///home/conda/feedstock_root/build_artifacts/requests_1684774241324/work
requests-oauthlib==1.3.1
resampy==0.4.2
rich==13.5.3
rsa==4.9
scikit-image==0.21.0
scikit-learn==1.3.0
scipy==1.11.2
six==1.16.0
soundfile==0.12.1
stack-data==0.6.2
tabulate==0.9.0
tensorboard==2.8.0
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tensorboardX==2.6.2.2
tensorflow-gpu==2.8.0
tensorflow-io-gcs-filesystem==0.34.0
termcolor==2.3.0
tf-estimator-nightly==2.8.0.dev2021122109
threadpoolctl==3.2.0
tifffile==2023.9.18
torch==1.12.1
torch-ema==0.3
torchaudio==0.12.1
torchvision==0.13.1
tornado==6.3.3
tqdm==4.66.1
traitlets==5.9.0
trimesh==3.23.5
typing_extensions @ file:///home/conda/feedstock_root/build_artifacts/typing_extensions_1695040754690/work
tzdata==2023.3
urllib3==1.26.16
viztracer==0.15.6
wcwidth==0.2.6
Werkzeug==2.3.7
wrapt==1.15.0
yacs==0.1.8
Fictionarry commented 9 months ago

Sorry I have no idea about it, the problem is strange. So far I have tested the code on sm75 and sm86 machines and it works well in all. Also no similar problem has been reported in other repos with gridencoder, as far as I know. Maybe you can take a test to find which is the step that limited the speed.

mjbooo commented 9 months ago

image

Could you take a look at the tqdm bar above? Are you saying your machine processes approximately 351 frames in about 10 ~ 11 seconds, which is equivalent to a frame rate of 35 FPS?

Fictionarry commented 9 months ago

Could you take a look at the tqdm bar above? Are you saying your machine processes approximately 351 frames in about 10 ~ 11 seconds, which is equivalent to a frame rate of 35 FPS?

Normally the inference of only head can reach 34 FPS (the value we reported in the paper, 3080ti). And in situations with torso, it drops a little but not much

1 2

Fictionarry commented 9 months ago

Replacing gridencoder by tiny-cuda-nn encoding should help if the encoder is the problem. Otherwise I guess there must be something wrong apart from the code.

mjbooo commented 9 months ago

Thank you for your kind help! I'll give it a try

mjbooo commented 9 months ago

@Fictionarry

The inference speed increased as I allocated more CPU and memory!

However, for those who are struggling with gridencoder, the following bash statement for setting may be helpful!

export TORCH_CUDA_ARCH_LIST=“[YOUR_COMPUTE_CAPABILITY]”

Thank you again for your kind help!!!

anliyuan commented 3 days ago

@mjbooo Hi! Did you finally fix this?