chn-lee-yumi / MaterialSearch

AI语义搜索本地素材。以图搜图、查找本地素材、根据文字描述匹配画面、视频帧搜索、根据画面描述搜索视频。Semantic search. Search local photos and videos through natural language.
GNU General Public License v3.0
863 stars 117 forks source link

[bug]单个文件处理出现异常,导致整个扫描任务中断 #73

Closed Jackxwb closed 5 months ago

Jackxwb commented 5 months ago

部分Log

2024-05-30 23:40:22,853 database INFO 新增文件:\\omv\fileRun\杂图\bili\1583321693522.jpg
2024-05-30 23:40:22,983 database INFO 新增文件:\\omv\fileRun\杂图\bili\1581075817475.jpg
2024-05-30 23:40:23,228 database INFO 新增文件:\\omv\fileRun\杂图\bili\1588654097826.jpg
2024-05-30 23:40:23,323 database INFO 新增文件:\\omv\fileRun\杂图\bili\screenshot\批量图片缩放_20201204121707\565785@1577270598@2.jpg
2024-05-30 23:40:23,544 database INFO 新增文件:\\omv\fileRun\big\月うさぎ\91362162_p0.png
Exception in thread Thread-6 (scan):
Traceback (most recent call last):
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "V:\ai\MaterialSearchWindows\scan.py", line 215, in scan
    self.handle_image_batch(session, image_batch_dict)
  File "V:\ai\MaterialSearchWindows\scan.py", line 167, in handle_image_batch
    path_list, features_list = process_images(list(image_batch_dict.keys()))
  File "V:\ai\MaterialSearchWindows\process_assets.py", line 77, in process_images
    inputs = processor(images=images, return_tensors="pt")["pixel_values"].to(torch.device(DEVICE))
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\models\chinese_clip\processing_chinese_clip.py", line 105, in __call__
    image_features = self.image_processor(images, return_tensors=return_tensors, **kwargs)
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\image_processing_utils.py", line 551, in __call__
    return self.preprocess(images, **kwargs)
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\models\chinese_clip\image_processing_chinese_clip.py", line 304, in preprocess
    images = [
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\models\chinese_clip\image_processing_chinese_clip.py", line 305, in <listcomp>
    self.resize(image=image, size=size, resample=resample, input_data_format=input_data_format)
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\models\chinese_clip\image_processing_chinese_clip.py", line 173, in resize
    return resize(
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\image_transforms.py", line 327, in resize
    image = to_pil_image(image, do_rescale=do_rescale, input_data_format=input_data_format)
  File "D:\ProgramData\anaconda3\envs\llama2codeinterpreter\lib\site-packages\transformers\image_transforms.py", line 204, in to_pil_image
    image = image.astype(np.uint8)
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 4.17 MiB for an array with shape (1350, 1080, 3) and data type uint8
2024-05-30 23:40:25,128 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:40:25] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:40:30,136 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:40:30] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:40:35,148 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:40:35] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:40:40,151 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:40:40] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:40:45,158 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:40:45] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:40:50,160 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:40:50] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:40:55,167 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:40:55] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:00,178 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:00] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:05,186 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:05] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:10,200 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:10] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:15,196 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:15] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:20,214 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:20] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:25,226 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:25] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:30,229 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:30] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:35,232 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:35] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:40,240 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:40] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:45,259 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:45] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:50,258 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:50] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:41:55,271 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:41:55] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:42:00,282 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:42:00] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:42:05,288 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:42:05] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:42:10,289 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:42:10] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:42:15,292 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:42:15] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:42:20,298 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:42:20] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:42:25,299 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:42:25] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:42:30,304 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:42:30] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:42:35,303 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:42:35] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:42:40,314 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:42:40] "GET /api/status HTTP/1.1" 200 -
2024-05-30 23:42:45,319 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:42:45] "GET /api/status HTTP/1.1" 200 -

另外这行 status 的输出能否隐藏或改成其他显示吗(比如输入输出到独立的log文件?)

web页面无响应,没有重启、重试机制,如果不盯着控制台完全不知道发生了异常

chn-lee-yumi commented 5 months ago

这个问题确实需要优化。我会记到TODO。

chn-lee-yumi commented 5 months ago

另外这行 status 的输出能否隐藏或改成其他显示吗

你指哪行?

Jackxwb commented 5 months ago

另外这行 status 的输出能否隐藏或改成其他显示吗

你指哪行?

这一行

2024-05-30 23:40:25,128 werkzeug INFO 127.0.0.1 - - [30/May/2024 23:40:25] "GET /api/status HTTP/1.1" 200 -

我找没有找到关掉它的方法,它打印输出的频率太高了

chn-lee-yumi commented 5 months ago

我找没有找到关掉它的方法,它打印输出的频率太高了

设置环境变量LOG_LEVEL=WARNING

Jackxwb commented 5 months ago

我找没有找到关掉它的方法,它打印输出的频率太高了

设置环境变量LOG_LEVEL=WARNING

好像

2024-05-30 23:40:23,544 database INFO 新增文件:xxxxx

这一行也是 INFO 😂,改成 WARNING 的话就看不到这行了吧😂


WARNING 也不生效🤣

**************************************************
>>> TRANSFORMERS_OFFLINE = 1
>>> HF_DATASETS_OFFLINE = None
>>> HF_HOME = ./huggingface
>>> force_download = None
>>> local_files_only = None
>>> LOG_LEVEL = WARNING
**************************************************
V:\ai\MaterialSearchWindows True
**************************************************
Loading models[OFA-Sys/chinese-clip-vit-base-patch16, offLine=1]...
Models loaded[OFFLINE].
 * Serving Flask app 'main'
 * Debug mode: off
2024-06-02 22:21:59,399 werkzeug INFO WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8085
 * Running on http://192.168.2.110:8085
2024-06-02 22:21:59,400 werkzeug INFO Press CTRL+C to quit
2024-06-02 22:21:59,434 werkzeug INFO 127.0.0.1 - - [02/Jun/2024 22:21:59] "GET /api/status HTTP/1.1" 200 -
2024-06-02 22:21:59,582 werkzeug INFO 127.0.0.1 - - [02/Jun/2024 22:21:59] "GET /api/status HTTP/1.1" 200 -
2024-06-02 22:21:59,612 werkzeug INFO 127.0.0.1 - - [02/Jun/2024 22:21:59] "GET /api/status HTTP/1.1" 200 -

LOG_LEVEL 相关代码:

...
# *****日志配置*****
LOG_LEVEL = os.getenv('LOG_LEVEL', 'INFO')  # 日志等级:NOTSET/DEBUG/INFO/WARNING/ERROR/CRITICAL
...
print(f">>> LOG_LEVEL = {LOG_LEVEL}")
Jackxwb commented 5 months ago

找AI问了下,似乎时框架内写死的,改不了,除非更换 WSGI 服务器😂

chn-lee-yumi commented 5 months ago

这个问题已增加异常处理。现在在提取特征时出错,会忽略这个文件。

找AI问了下,似乎时框架内写死的,改不了,除非更换 WSGI 服务器😂

这个问题已解决,现在LOG_LEVEL会应用到werkzeug的输出。