PaulLerner / ViQuAE

Source code and data used in the papers ViQuAE (Lerner et al., SIGIR'22), Multimodal ICT (Lerner et al., ECIR'23) and Cross-modal Retrieval (Lerner et al., ECIR'24)
https://paullerner.github.io/ViQuAE/
Other
25 stars 2 forks source link

None values / ragged nested sequences for face detection recognition #1

Closed PaulLerner closed 1 year ago

PaulLerner commented 1 year ago

Current image.face_detection and image.face_recognition scripts work with:

but fail with:

This seems related to this issue https://github.com/huggingface/datasets/issues/3676

cc @OA256864 @grimalPaul

traceback for face_detection

0%|          | 0/20 [00:00<?, ?ba/s]/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/facenet_pytorch/models/utils/detect_face.py:183: VisibleDeprecationWarning: Cr[0/1944]
n ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'd
type=object' when creating the ndarray.
  batch_boxes, batch_points = np.array(batch_boxes), np.array(batch_points)
/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/facenet_pytorch/models/mtcnn.py:339: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a
 list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  boxes = np.array(boxes)
/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/facenet_pytorch/models/mtcnn.py:340: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a
 list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  probs = np.array(probs)
/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/facenet_pytorch/models/mtcnn.py:341: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a
 list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
  points = np.array(points)
  0%|          | 0/20 [00:12<?, ?ba/s]
Traceback (most recent call last):
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/data/meerqat/ViQuAE/meerqat/image/face_detection.py", line 179, in <module>
    dataset = dataset_detect_faces(dataset, model=model, image_key=image_key, save_root_path=save_root_path)
  File "/home/data/meerqat/ViQuAE/meerqat/image/face_detection.py", line 146, in dataset_detect_faces
    dataset = dataset.map(dataset_detect_face, batched=True, fn_kwargs=kwargs, batch_size=batch_size)
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 2590, in map
    desc=desc,
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 584, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 551, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/datasets/fingerprint.py", line 480, in wrapper
    out = func(self, *args, **kwargs)
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/datasets/arrow_dataset.py", line 2985, in _map_single
    writer.write_batch(batch)
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/datasets/arrow_writer.py", line 524, in write_batch
    arrays.append(pa.array(typed_sequence))
  File "pyarrow/array.pxi", line 229, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 110, in pyarrow.lib._handle_arrow_array_protocol
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/datasets/arrow_writer.py", line 182, in __arrow_array__
    out = list_of_np_array_to_pyarrow_listarray(data)
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/datasets/features/features.py", line 1350, in list_of_np_array_to_pyarrow_listarray
    [numpy_to_pyarrow_listarray(arr, type=type) if arr is not None else None for arr in l_arr]
  File "/home/users/oadjali/.conda/envs/v100/lib/python3.7/site-packages/datasets/features/features.py", line 1342, in list_of_pa_arrays_to_pyarrow_listarray
    values = pa.concat_arrays(l_arr)
  File "pyarrow/array.pxi", line 2526, in pyarrow.lib.concat_arrays
  File "pyarrow/error.pxi", line 143, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: arrays to be concatenated must be identically typed, but float and null were encountered.

face_recognition

for this we simply need to check if face embedding/landmark/… is empty instead of None