devmaxxing / videocr-PaddleOCR

Extract hardcoded subtitles from videos using machine learning
MIT License
124 stars 16 forks source link

Unable to Import videocr #14

Closed obaidabit closed 11 months ago

obaidabit commented 11 months ago

I am trying to run the Colab example that you have provided and It is not working.

from videocr import save_subtitles_to_file

#@title OCR parameters
input_file_path = "example.mp4" #@param {type:"string"}
output_file_path = "example.srt" #@param {type:"string"}
language_code = "ch" #@param {type:"string"}
use_gpu = False #@param {type:"boolean"}
start_time = "00:00" #@param {type:"string"}
end_time = "" #@param {type:"string"}
confidence_threshold = 50 #@param {type:"number"}
similarity_threshold = 80 #@param {type:"number"}
frames_to_skip = 0 #@param {type:"integer"}
crop_x = 40 #@param {type:"integer"}
crop_y = 650 #@param {type:"integer"}
crop_width = 890 #@param {type:"integer"}
crop_height = 69 #@param {type:"integer"}

save_subtitles_to_file(input_file_path, output_file_path, lang=language_code, 
                       time_start=start_time, time_end=end_time, 
                       conf_threshold=confidence_threshold, sim_threshold=similarity_threshold,
                       use_gpu=use_gpu,
                       # # Models different from the default mobile models can be downloaded here: https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.3/doc/doc_en/models_list_en.md
                       # det_model_dir='<PADDLEOCR DETECTION MODEL DIR>', rec_model_dir='<PADDLEOCR RECOGNITION MODEL DIR>', 
                       # brightness_threshold=210, similar_image_threshold=1000 # filters might help
                       # use_fullframe=True, # note: videocr just assumes horizontal lines of text. vertical text scenario hasn't been implemented yet
                       frames_to_skip=frames_to_skip, # can skip inference for some frames to speed up the process
                       crop_x=crop_x, crop_y=crop_y, crop_width=crop_width, crop_height=crop_height)

and I got this error

Error: Can not import avx core while this file exists: /usr/local/lib/python3.10/dist-packages/paddle/fluid/core_avx.so
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
[<ipython-input-5-56913c3f6c54>](https://localhost:8080/#) in <cell line: 1>()
----> 1 from videocr import save_subtitles_to_file
      2 
      3 #@title OCR parameters
      4 input_file_path = "example.mp4" #@param {type:"string"}
      5 output_file_path = "example.srt" #@param {type:"string"}

11 frames
[/usr/local/lib/python3.10/dist-packages/paddle/fluid/core.py](https://localhost:8080/#) in <module>
    254 if avx_supported():
    255     try:
--> 256         from . import core_avx
    257         core_avx.LoDTensor = core_avx.Tensor
    258 

ImportError: /usr/local/lib/python3.10/dist-packages/paddle/fluid/core_avx.so: undefined symbol: _dl_sym, version GLIBC_PRIVATE

I think the problem is with the Python version, is because recently Colab started to use Python 3.10.

devmaxxing commented 11 months ago

Not sure but the above error might be due to Cuda version not being compatible with the old paddlepaddle library. There was also a similar error caused by the Colab environment missing libssl. I've updated the Colab notebook to use the latest paddleocr/paddlepaddle versions and install libssl

obaidabit commented 11 months ago

Thank you