euagendas / m3inference

A deep learning system for demographic inference (gender, age, and individual/person) that was trained on massive Twitter dataset using profile images, screen names, names, and biographies
http://www.euagendas.org
GNU Affero General Public License v3.0
145 stars 57 forks source link

Error fetching images will fail the infer method #21

Open eitantorf opened 3 years ago

eitantorf commented 3 years ago

I am trying to run transform_jsonl (to download images and prepare m3 json file) and right after running the infer method - the issue occurs when transform_jsonl does not find some images but still writes the path to the m3 json file, causing the infer to fail over: FileNotFoundError: [Errno 2] No such file or directory

computermacgyver commented 3 years ago

Thank you @eitantorf for letting us know.

We're supposed to catch this and replace with the "default image" if the image is unavailable:

https://github.com/euagendas/m3inference/blob/b208bae2ee84a52a0d85af740e364f9de98c4dc3/m3inference/m3twitter.py#L97-L98

I'll have to look into it further. Would you be able to share an example of input and errant output? I'm curious if it is supposed to be a path to the default image but that the default image is unavailable or something else.

eitantorf commented 3 years ago

Ah I see you fixed it, great. My code is the previous one I now see, maybe the pip install version does not include this fix?

computermacgyver commented 3 years ago

I'll check what on pip. Thanks, @eitantorf

zijwang commented 3 years ago

Thanks, @eitantorf and @computermacgyver! I have pushed the latest version to pypi. @eitantorf , could you try updating the package and see whether the issue persists?

davidjurgens commented 3 years ago

It might be worth extending this behavior to all image-related errors during inference. Here's one I just ran into where the file type can't be inferred, which causes the whole infer call to error out:

UnidentifiedImageError: Caught UnidentifiedImageError in DataLoader worker process 3.
Original Traceback (most recent call last):
  File "/opt/anaconda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 202, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/anaconda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/anaconda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/anaconda/lib/python3.8/site-packages/m3inference/dataset.py", line 37, in __getitem__
    return self._preprocess_data(data)
  File "/opt/anaconda/lib/python3.8/site-packages/m3inference/dataset.py", line 43, in _preprocess_data
    fig = self._image_loader(img_path)
  File "/opt/anaconda/lib/python3.8/site-packages/m3inference/dataset.py", line 91, in _image_loader
    image = Image.open(image_name)
  File "/opt/anaconda/lib/python3.8/site-packages/PIL/Image.py", line 2958, in open
    raise UnidentifiedImageError(
PIL.UnidentifiedImageError: cannot identify image file 'someimage.jpg'

It's probably worth not failing in those cases and just printing a warning. That, or adding some kind of pandas-like flag errors='ignore' as the default and let the users decide if the infer should stop on error.