euagendas / m3inference

A deep learning system for demographic inference (gender, age, and individual/person) that was trained on massive Twitter dataset using profile images, screen names, names, and biographies
http://www.euagendas.org
GNU Affero General Public License v3.0
145 stars 57 forks source link

Potential m3twitter.infer_id bug #8

Closed ndbhagwa closed 3 years ago

ndbhagwa commented 3 years ago

Hello, first time GitHub issuer here!

When I try to process certain user id_str's I get a FileNotFound error. Here is a user id_str I chose at random - '238173039'. When I run m3twitter.infer_id, I receive the following error:

Traceback (most recent call last):
  File "is_organization.py", line 24, in <module>
    org = m3twitter.infer_id(id_str)['output']['org']
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/m3twitter.py", line 208, in infer_id
    output=self._twitter_api(id=id)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/m3twitter.py", line 187, in _twitter_api
    return self.process_twitter(r.json())
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/m3twitter.py", line 245, in process_twitter
    download_resize_img(img, img_file_resize, img_file_full)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/preprocess.py", line 28, in download_resize_img
    with open(img_out_path_fullsize, "wb") as fh:
FileNotFoundError: [Errno 2] No such file or directory: '/home/ndbhagwa/m3/cache/TheWhaleShark.com/profile_images/2602602416/m8su11Vx_400x400'

I am not sure what the cause of this error might be. I originally thought it was a rate limit error since it does not occur consistently, but for other rate errors, I see warnings like this:

<dt> - INFO - m3inference.m3twitter -   Results not in cache. Fetching data from Twitter for id <#>
<dt> - INFO - m3inference.m3twitter -   GET /users/show.json?id=<#>
<dt> - WARNING - m3inference.m3twitter -   Could not retreive screen_name
<dt> - WARNING - m3inference.m3twitter -   Could not retreive id_str
<dt> - WARNING - m3inference.m3twitter -   Could not retreive description
<dt>  - WARNING - m3inference.m3twitter -   Could not retreive name
<dt> - WARNING - m3inference.m3twitter -   Could not retreive profile_image_url
<dt> - WARNING - m3inference.m3twitter -   Unable to extract image from Twitter. Using default image.
<dt> - INFO - m3inference.dataset -   1 data entries loaded
zijwang commented 3 years ago

Thanks for reaching out!

There are a couple of things to be checked first: 1) could you let me know which version you were using? 2) If you were using the latest version (1.1.0), could you check whether you have specified your Twitter API tokens (see here)?

ndbhagwa commented 3 years ago

I am using version 1.1.0.

Yep, I have specified my API tokens.

This works very well for me usually, it is just in a few cases where I have errors.

zijwang commented 3 years ago

Thanks for the reply! The issue seems to be due to that the profile image link doesn't have a postfix (i.e., typically the link could be <image>.png while the example you provided only have <image>). I submitted a PR (#9) that hopefully should fix this issue. Could you try to install the PR version and see whether that works?

ndbhagwa commented 3 years ago

That seems to have fixed the id mentioned! However, there are still some id's I am getting a similar FileNotFound error for. For example, id_str='1186776981200875526' causes this error:

Traceback (most recent call last):
  File "is_organization.py", line 24, in <module>
    org = m3twitter.infer_id(id_str)['output']['org']
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/m3twitter.py", line 210, in infer_id
    output=self._twitter_api(id=id)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/m3twitter.py", line 189, in _twitter_api
    return self.process_twitter(r.json())
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/m3twitter.py", line 258, in process_twitter
    pred = self.infer(data, batch_size=1, num_workers=1)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/m3inference.py", line 125, in infer
    for batch in dataloader:
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 363, in __next__
    data = self._next_data()
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 989, in _next_data
    return self._process_data(data)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1014, in _process_data
    data.reraise()
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/dataset.py", line 37, in __getitem__
    return self._preprocess_data(data)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/dataset.py", line 43, in _preprocess_data
    fig = self._image_loader(img_path)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/m3inference/dataset.py", line 91, in _image_loader
    image = Image.open(image_name)
  File "/home/ndbhagwa/miniconda3/lib/python3.8/site-packages/PIL/Image.py", line 2878, in open
    fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: '/home/ndbhagwa/m3/cache/1186776981200875526_224x224.png'
zijwang commented 3 years ago

@ndbhagwa Could you check now?

ndbhagwa commented 3 years ago

Seems to be working fine now! Thanks!