google-research / perch

Apache License 2.0
169 stars 37 forks source link

high system memory usage when extracting embeddings #687

Open joshctaylor opened 2 weeks ago

joshctaylor commented 2 weeks ago

Hi,

I'm wondering if there is a simple way of restricting RAM (not vRAM) use when calculating embeddings - sometimes I need to use a laptop.

I'm figuring that the multi-threading used to load the GPU is taking up a lot of RAM?

Thanks all

joshctaylor commented 2 weeks ago

I've dug into this a little now, it seems that around 150GB of ram is needed when the process starts. 5x memory allocation error messages are shown before the process settles down to using 15GB system ram.

As there are 5x workers for for multi_load_audio_window in chirp/audio_utils.py this could point to where the issue might be.

I'm using NVIDIA A100 80GB on Intel Xeon 24 core VM with 220 GB ram, if I try to use a lesser machine, the process is killed by the linux kernel when it exhausts ram and swap.

I'm working on 1 hour duration Flac formal files.

Found 0 existing embedding ids. 
Processing 1574 new source infos. 
  0%|          | 0/1574 [00:00<?, ?it/s]2024-09-01 20:47:30.019656: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 453120000 exceeds 10% of free system memory.
2024-09-01 20:47:30.484099: W tensorflow/compiler/tf2xla/kernels/assert_op.cc:38] Ignoring Assert operator jax2tf_infer_fn_/assert_equal_1/Assert/AssertGuard/Assert
2024-09-01 20:47:33.430616: E external/local_xla/xla/service/slow_operation_alarm.cc:65] Trying algorithm eng0{} for conv (f32[708,640,501,1]{3,2,1,0}, u8[0]{0}) custom-call(f32[708,1,160640,1]{3,2,1,0}, f32[640,1,640,1]{3,2,1,0}), window={size=640x1 stride=320x1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0} is taking a while...

...... simalar messages ....

Trying algorithm eng46{k2=5,k5=3,k14=4} for conv (f32[708,144,125,40]{3,2,1,0}, u8[0]{0}) custom-call(f32[708,144,125,40]{3,2,1,0}, f32[144,1,3,3]{3,2,1,0}), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, feature_group_count=144, custom_call_target="__cudnn$convForward", backend_config={"conv_result_scale":1,"activation_mode":"kNone","side_input_scale":0,"leakyrelu_alpha":0} is taking a while...
W0000 00:00:1725223732.544358    5221 graph_launch.cc:671] Fallback to op-by-op mode because memset node breaks graph update
  0%|          | 1/1574 [01:36<42:03:23, 96.25s/it]2024-09-01 20:48:56.767735: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 453120000 exceeds 10% of free system memory.
  0%|          | 2/1574 [01:40<18:27:41, 42.28s/it]2024-09-01 20:48:58.825938: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 453120000 exceeds 10% of free system memory.
  0%|          | 3/1574 [01:43<10:31:21, 24.11s/it]2024-09-01 20:49:00.984838: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 453120000 exceeds 10% of free system memory.
  0%|          | 4/1574 [01:44<6:38:29, 15.23s/it] 2024-09-01 20:49:02.478230: W external/local_tsl/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 453120000 exceeds 10% of free system memory.