benetech / VideoDeduplication

GNU General Public License v3.0
35 stars 12 forks source link

Investigate CSV write error on rapids branch blocking job completion #478

Open johnhbenetech opened 2 years ago

johnhbenetech commented 2 years ago

In testing rapids branch on UI and terminal, I experience the following Error preventing matches from being generated and also semantic search index from being prepared.

[2022-03-30 20:03:39,020: INFO] [luigi-interface] Running Worker with 1 processes
[2022-03-30 20:03:39,020: INFO] [luigi-interface] [pid 10041] Worker Worker(salt=146995060, workers=1, host=f90e167d6ad7, username=root, pid=10041) running   CondenseFingerprintsTask(config=Config(sources=SourcesConfig(root='data/', extensions=['mp4', 'ogv', 'webm', 'avi', 'flv', 'mkv'], hash_mode='file', hash_cache='data/representations/hashes'), repr=RepresentationConfig(directory='data/representations', storage_type=<StorageType.DETECT: 'detect'>), database=DatabaseConfig(use=True, uri='postgresql://postgres:admin@postgres:5432/videodeduplicationdb'), processing=ProcessingConfig(video_list_filename='video_dataset_list.txt', match_distance=0.75, filter_dark_videos=True, filter_dark_videos_thr=2, min_video_duration_seconds=3, detect_scenes=True, minimum_scene_duration=2, pretrained_model_local_path=None, frame_sampling=1, save_frames=False, keep_fileoutput=True), templates=TemplatesConfig(source_path='data/templates/', distance=0.07, distance_min=0.05, override=False, extensions=('png', 'jpg', 'jpeg')), security=SecurityConfig(master_key_path=None), file_storage=FileStorageConfig(directory='file-storage'), logging=LoggingConfig(file_path='./processing_error.log', file_format='[%(asctime)s: %(levelname)s] [%(name)s] %(message)s', file_level=<LogLevel.ERROR: 40>, console_format='[%(asctime)s: %(levelname)s] %(message)s', console_level=<LogLevel.INFO: 20>)), prefix=., fingerprint_size=500)
[2022-03-30 20:03:39,021: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/frames
[2022-03-30 20:03:39,022: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/frame_level
[2022-03-30 20:03:39,023: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/video_level
[2022-03-30 20:03:39,023: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/video_signatures
[2022-03-30 20:03:39,184: INFO] [task.CondenseFingerprintsTask] Reading existing condensed fingerprints
[2022-03-30 20:03:39,184: INFO] [task.CondenseFingerprintsTask] Loaded 0 previously condensed fingerprints
[2022-03-30 20:03:39,262: INFO] [task.CondenseFingerprintsTask] Collecting file-keys since the very beginning
[2022-03-30 20:03:39,415: INFO] [task.CondenseFingerprintsTask] Collected 444 file keys
[2022-03-30 20:03:39,415: INFO] [task.CondenseFingerprintsTask] Reading fingerprints
[2022-03-30 20:03:39,416: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/frames
[2022-03-30 20:03:39,416: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/frame_level
[2022-03-30 20:03:39,417: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/video_level
[2022-03-30 20:03:39,418: INFO] [winnow.utils.repr] Detected simple path-based repr-storage in /project/data/representations/video_signatures
[2022-03-30 20:03:39,687: INFO] [task.CondenseFingerprintsTask] Creating ndarray with fingerprints
[2022-03-30 20:03:39,687: INFO] [task.CondenseFingerprintsTask] Creating file-keys DataFrame
[2022-03-30 20:03:39,688: INFO] [task.CondenseFingerprintsTask] Loaded 444 new fingerprints.
[2022-03-30 20:03:39,688: INFO] [task.CondenseFingerprintsTask] Writing 444 fingerprints to ['data/representations/condensed_fingerprints/condensed_fingerprints__2022_03_25_192103493531.npy', 'data/representations/condensed_fingerprints/condensed_fingerprints__2022_03_25_192103493531.files.csv']
[2022-03-30 20:03:41,690: ERROR] [luigi-interface] [pid 10041] Worker Worker(salt=146995060, workers=1, host=f90e167d6ad7, username=root, pid=10041) failed    CondenseFingerprintsTask(config=Config(sources=SourcesConfig(root='data/', extensions=['mp4', 'ogv', 'webm', 'avi', 'flv', 'mkv'], hash_mode='file', hash_cache='data/representations/hashes'), repr=RepresentationConfig(directory='data/representations', storage_type=<StorageType.DETECT: 'detect'>), database=DatabaseConfig(use=True, uri='postgresql://postgres:admin@postgres:5432/videodeduplicationdb'), processing=ProcessingConfig(video_list_filename='video_dataset_list.txt', match_distance=0.75, filter_dark_videos=True, filter_dark_videos_thr=2, min_video_duration_seconds=3, detect_scenes=True, minimum_scene_duration=2, pretrained_model_local_path=None, frame_sampling=1, save_frames=False, keep_fileoutput=True), templates=TemplatesConfig(source_path='data/templates/', distance=0.07, distance_min=0.05, override=False, extensions=('png', 'jpg', 'jpeg')), security=SecurityConfig(master_key_path=None), file_storage=FileStorageConfig(directory='file-storage'), logging=LoggingConfig(file_path='./processing_error.log', file_format='[%(asctime)s: %(levelname)s] [%(name)s] %(message)s', file_level=<LogLevel.ERROR: 40>, console_format='[%(asctime)s: %(levelname)s] %(message)s', console_level=<LogLevel.INFO: 20>)), prefix=., fingerprint_size=500)
Traceback (most recent call last):
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/luigi/worker.py", line 191, in run
    new_deps = self._run_get_new_deps()
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/luigi/worker.py", line 133, in _run_get_new_deps
    task_gen = self.task.run()
  File "/project/winnow/pipeline/luigi/condense.py", line 263, in run
    target.write(condensed, new_results_time)
  File "/project/winnow/pipeline/luigi/condense.py", line 206, in write
    condensed.file_keys_df.to_csv(keys_out)
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/core/generic.py", line 3466, in to_csv
    return DataFrameRenderer(formatter).to_csv(
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/io/formats/format.py", line 1105, in to_csv
    csv_formatter.save()
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 257, in save
    self._save()
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 262, in _save
    self._save_body()
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 300, in _save_body
    self._save_chunk(start_i, end_i)
  File "/anaconda/envs/winnow/lib/python3.9/site-packages/pandas/io/formats/csvs.py", line 311, in _save_chunk
    libwriters.write_csv_rows(
  File "pandas/_libs/writers.pyx", line 55, in pandas._libs.writers.write_csv_rows
TypeError: write() argument must be str, not bytes
[2022-03-30 20:03:41,698: ERROR] [task_queue.tasks] Error occurred while executing luigi tasks: CondenseFingerprintsTask(config=Config(sources=SourcesConfig(root='data/', extensions=['mp4', 'ogv', 'webm', 'avi', 'flv', 'mkv'], hash_mode='file', hash_cache='data/representations/hashes'), repr=RepresentationConfig(directory='data/representations', storage_type=<StorageType.DETECT: 'detect'>), database=DatabaseConfig(use=True, uri='postgresql://postgres:admin@postgres:5432/videodeduplicationdb'), processing=ProcessingConfig(video_list_filename='video_dataset_list.txt', match_distance=0.75, filter_dark_videos=True, filter_dark_videos_thr=2, min_video_duration_seconds=3, detect_scenes=True, minimum_scene_duration=2, pretrained_model_local_path=None, frame_sampling=1, save_frames=False, keep_fileoutput=True), templates=TemplatesConfig(source_path='data/templates/', distance=0.07, distance_min=0.05, override=False, extensions=('png', 'jpg', 'jpeg')), security=SecurityConfig(master_key_path=None), file_storage=FileStorageConfig(directory='file-storage'), logging=LoggingConfig(file_path='./processing_error.log', file_format='[%(asctime)s: %(levelname)s] [%(name)s] %(message)s', file_level=<LogLevel.ERROR: 40>, console_format='[%(asctime)s: %(levelname)s] %(message)s', console_level=<LogLevel.INFO: 20>)), prefix=., fingerprint_size=500), write() argument must be str, not bytes
[2022-03-30 20:03:41,705: INFO] [luigi-interface] Informed scheduler that task   CondenseFingerprintsTask_Config_sources_S_500___8510f14eba   has status   FAILED
[2022-03-30 20:03:41,709: INFO] [luigi-interface] 
===== Luigi Execution Summary =====

Scheduled 5 tasks of which:
* 2 complete ones were encountered:
    - 1 ExifTask(...)
    - 1 SignaturesTask(...)
* 1 failed:
    - 1 CondenseFingerprintsTask(...)
* 2 were left pending, among these:
    * 2 had failed dependencies:
        - 1 AnnoyIndexTask(...)
        - 1 DBMatchesTask(...)
johnhbenetech commented 2 years ago

@fsbatista confirmed not happening on development branch