joshuachen3333 commented 9 months ago

In my MacBook Pro (intel i5 cpu 16GB ram), with the following config, I did my test like this:

environment

MacOS 14.3 Xcode 15.2 kMDItemVersion = "2.8.5" Apple clang version 15.0.0 (clang-1500.1.0.2.5) Python 3.11.7

install miniconda

conda create -n video_summarize_mlx python=3.11
conda activate video_summarize_mlx
git clone https://github.com/Peter-obi/Video_summarization_mlx
cd Video_summarization_mlx

comment out the following lines of requirements.txt

# mlx
# mlx-lm

because they can not be installed by pip

pip install mlx
ERROR: Could not find a version that satisfies the requirement mlx (from versions: none)
ERROR: No matching distribution found for mlx

install pkgs in requirements.txt by pip

pip install --upgrade pip
pip  install -r requirements.txt

install the remaining pkgs by conda
```
conda install -c conda-forge mlx mlx-lm
```

and to run the code like this:

python -m spacy download en_core_web_sm
python main.py --input_path "/Users/joshua/video/test1.mp4" --title "test1"

but getting libc++abi: terminating due to uncaught exception of type std::runtime_error: [AddMM::eval_cpu] Currently only supports float32. Abort trap: 6

Fetching 5 files: 100%|█████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 12343.45it/s]
Processing local video file: /Users/joshua/video/test1.mp4
Converting to wav with command:  ffmpeg -y -i "/Users/joshua/video/test1.mp4" -ar 16000 -ac 1 -c:a pcm_s16le "files/audio/test1.wav"
ffmpeg version 4.4.4 Copyright (c) 2000-2023 the FFmpeg developers
built with Apple clang version 15.0.0 (clang-1500.1.0.2.5)
configuration: --prefix=/opt/local --cc=/usr/bin/clang --mandir=/opt/local/share/man --enable-audiotoolbox --disable-indev=jack --disable-libjack --disable-libopencore-amrnb --disable-libopencore-amrwb --disable-libxcb --disable-libxcb-shm --disable-libxcb-xfixes --enable-opencl --disable-outdev=xv --enable-sdl2 --disable-securetransport --enable-videotoolbox --enable-avfilter --enable-avresample --enable-fontconfig --enable-gnutls --enable-libass --enable-libbluray --enable-libdav1d --enable-libfreetype --enable-libfribidi --enable-libmodplug --enable-libmp3lame --enable-libopenjpeg --enable-libopus --enable-librsvg --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libvorbis --enable-libvpx --enable-libzimg --enable-libzvbi --enable-lzma --enable-pthreads --enable-shared --enable-swscale --enable-zlib --enable-libaom --enable-libsvtav1 --arch=x86_64 --enable-x86asm --enable-gpl --enable-libvidstab --enable-libx264 --enable-libx265 --enable-libxvid --enable-postproc
libavutil      56. 70.100 / 56. 70.100
libavcodec     58.134.100 / 58.134.100
libavformat    58. 76.100 / 58. 76.100
libavdevice    58. 13.100 / 58. 13.100
libavfilter     7.110.100 /  7.110.100
libavresample   4.  0.  0 /  4.  0.  0
libswscale      5.  9.100 /  5.  9.100
libswresample   3.  9.100 /  3.  9.100
libpostproc    55.  9.100 / 55.  9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/Users/joshua/video/test1.mp4':
Metadata:
major_brand     : isom
minor_version   : 512
compatible_brands: isomiso2avc1mp41
encoder         : Lavf58.29.100
Duration: 00:56:45.07, start: 0.000000, bitrate: 373 kb/s
Stream #0:0(und): Video: h264 (Main) (avc1 / 0x31637661), yuv420p, 320x240 [SAR 1:1 DAR 4:3], 242 kb/s, 25 fps, 25 tbr, 90k tbn, 50 tbc (default)
Metadata:
  handler_name    : VideoHandler
  vendor_id       : [0][0][0][0]
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 125 kb/s (default)
Metadata:
  handler_name    : SoundHandler
  vendor_id       : [0][0][0][0]
Stream mapping:
Stream #0:1 -> #0:0 (aac (native) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, wav, to 'files/audio/test1.wav':
Metadata:
major_brand     : isom
minor_version   : 512
compatible_brands: isomiso2avc1mp41
ISFT            : Lavf58.76.100
Stream #0:0(und): Audio: pcm_s16le ([1][0][0][0] / 0x0001), 16000 Hz, mono, s16, 256 kb/s (default)
Metadata:
  handler_name    : SoundHandler
  vendor_id       : [0][0][0][0]
  encoder         : Lavc58.134.100 pcm_s16le
size=  106408kB time=00:56:45.06 bitrate= 256.0kbits/s speed= 809x    
video:0kB audio:106408kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000072%
ffmpeg return code:  0
Audio extracted to: files/audio/test1.wav
Transcribing files/audio/test1.wav (this may take a while)...
libc++abi: terminating due to uncaught exception of type std::runtime_error: [AddMM::eval_cpu] Currently only supports float32. </span>
Abort trap: 6
(video_summarize_mlx) bash-5.2$ /Users/joshua/miniconda3/envs/video_summarize_mlx/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

Peter-obi commented 9 months ago

MLX is an array framework for machine learning on Apple silicon ie M chips (read more here: https://github.com/ml-explore/mlx). Seems you have an Intel chip. First thoughts: you can swap out the whisper part for a compatible whisper one and then change the mlx model portions to 'normal' huggingface models.

az-boromir commented 9 months ago

Similar issue for me on an M2 Mac Air.

Transcribing files/audio/the_vision.wav (this may take a while)... libc++abi: terminating due to uncaught exception of type std::runtime_error: [AddMM::eval_cpu] Currently only supports float32. zsh: abort python main.py --input_path "The Vision.mp4" --title "Test" (video_summarize_mlx) Video_summarization_mlx % /video_summarize_mlx/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

Peter-obi commented 9 months ago

I will try to recreate the issue when I get back on my system. Some things you can try, make sure you are not running out of memory (happened once when I used a bigger model), also there are discussions on this here: https://github.com/conda/conda/issues/9589, https://github.com/apple/ml-stable-diffusion/issues/8, try also summarize_with_mlx instead of summarize_in_parallel to see if that helps (if it is a memory issue).

az-boromir commented 9 months ago

it is decode_result = model.decode(segment, options) inside decode_with_fallback that is breaking, will continue seeing if I can find out what the issue is.

printing out the segment shows: array([[-0.39502, -0.39502, -0.39502, ..., -0.39502, -0.39502, -0.39502], [-0.39502, -0.39502, -0.39502, ..., -0.39502, -0.39502, -0.39502], [-0.39502, -0.39502, -0.39502, ..., -0.39502, -0.39502, -0.39502], ..., [0.575195, 0.70752, 0.590332, ..., -0.39502, -0.39502, -0.39502], [0.603516, 0.728516, 0.655762, ..., -0.39502, -0.39502, -0.39502], [0.599609, 0.740723, 0.674316, ..., -0.39502, -0.39502, -0.39502]], dtype=float16)

and the error is: libc++abi: terminating due to uncaught exception of type std::runtime_error: [AddMM::eval_cpu] Currently only supports float32.

could it just be a data type mismatch?

Peter-obi commented 9 months ago

Interesting. I have never faced this particular issue, because I just use the already converted whisper from mlx examples and haven't ran into issues yet. Can you add this and see what happens: mel_segment = mel_segment.astype(mx.float32)?

joshuachen3333 commented 9 months ago

Could this be apple silicon M1 M2 only issue? I did it at a intel i5 platform. If yes , could I run it at a Linux i7 Nvidia env? How difficult is it to port to Linux x86_ 64 env?

az-boromir commented 9 months ago

Interesting. I have never faced this particular issue, because I just use the already converted whisper from mlx examples and haven't ran into issues yet. Can you add this and see what happens: mel_segment = mel_segment.astype(mx.float32)?

This didnt work. I changed the transcribe call to def to putting fp16 to false (def transcribe(audio_file, fp16=False, output_path="files/transcripts"):), and that issue is fixed, however I get a similar error when it is doing the summarize_with_mlx. I have the 4bit mixtral.

Audio has been transcribed in 22 seconds Found 1 chunks. Summarizing using MLX model... Generating summary with MLX model... libc++abi: terminating due to uncaught exception of type std::runtime_error: [Matmul::eval_cpu] Currently only supports float32. zsh: abort python main.py --input_path --title "Test" (video_summarize_mlx) miniconda/envs/video_summarize_mlx/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d '

Looking at this, I may be missing something though. The 4bit model shouldnt need a float 32 I think.

Peter-obi commented 8 months ago

Great that you solved that! Been busy for a couple of weeks. Have you solved the resource_tracker error? I reproduced the error but only at the Whisper level and it was always because I ran a big model or a very long input, and it was always solved by using a smaller model or breaking the input into smaller chunks.

Peter-obi / Video_summarization_mlx

libc++abi: terminating due to uncaught exception of type std::runtime_error: [AddMM::eval_cpu] Currently only supports float32. Abort trap: 6 #1

environment

install miniconda

comment out the following lines of requirements.txt

install pkgs in requirements.txt by pip

install the remaining pkgs by conda

and to run the code like this:

but getting libc++abi: terminating due to uncaught exception of type std::runtime_error: [AddMM::eval_cpu] Currently only supports float32. Abort trap: 6