Closed joausaga closed 3 years ago
Did you install m3inference from pip? If so, could you see whether installing from the master branch helps?
No, the problem persists. Do JSON lines in the file are read in order from top to bottom? if this is the case, it always breaks at line 36, which is this user https://twitter.com/JUc3m. Could black and white profile pictures be problematic?
Yes, it should be in order. Have you checked whether you have that image on your disk, and if so, whether you are able to open it?
Yes, I have it in my disk. I could open, convert to RGB, resize, and transform it into a tensor.
Thanks, Jorge, for helping us get to the bottom of this. I put that one user in a separate jsonl file (just one line) and downloaded the profile from Twitter. It ran for me without any issue.
Do you think you could try the same ? I.e., put just that user in a file and run the infer
method on that file?
The infer works in batches; so, my suspicion is that it is not this line / user in particular but one around it. I haven't download the profile photos for other users to test that yet, but if the one line / one user runs for you then perhaps you could try adding additional users until it breaks?
Would you also be able to confirm the version of Python you're using and the OS (Windows, Mac, Linux)? We've tested extensively on Linux and Mac, but I have seen a few issues popping up on Windows.
You can also run with the parameters batch_size=1, num_workers=1
to help better isolate the failing user.
m3.infer('tmp.jsonl',batch_size=1,num_workers=1)
Great thanks Scott! (@computermacgyver ), found the problem. It seems we might need to extend m3inference to support gif images. This user has a gif as her profile picture. The predictor breaks because when transforming the gif image into a tensor, the resulting tensor is of size 3x224x224 and not 1x224x224 as expected. I guess the first dimension 3 is because the gif is composed of three images, which are used to perform the animation.
I am running on Python 3.7.1 in a Linux/Ubuntu machine.
Hey @joausaga! We have updated the package and I think the issue with gif has been resolved. Could you try out the new version (v1.1.0) and see whether it works?
Here is what I did based on your example:
> python scripts/m3twitter.py --screen-name hermanas_malas --auth ./scripts/auth_example.txt --skip-cache
08/13/2020 10:50:52 - INFO - m3inference.m3inference - Version 1.1.0
08/13/2020 10:50:52 - INFO - m3inference.m3inference - Running on cpu.
08/13/2020 10:50:52 - INFO - m3inference.m3inference - Will use full M3 model.
08/13/2020 10:50:53 - INFO - m3inference.m3inference - Model full_model exists at [masked_link]/full_model.mdl.
08/13/2020 10:50:53 - INFO - m3inference.utils - Checking MD5 for model full_model at [masked_link]/full_model.mdl
08/13/2020 10:50:54 - INFO - m3inference.utils - MD5s match.
08/13/2020 10:50:54 - INFO - m3inference.m3inference - Loaded pretrained weight at [masked_link]/full_model.mdl
08/13/2020 10:50:54 - INFO - m3inference.m3twitter - skip_cache is True. Fetching data from Twitter for hermanas_malas.
08/13/2020 10:50:54 - INFO - m3inference.m3twitter - GET /users/show.json?screen_name=hermanas_malas
08/13/2020 10:50:54 - INFO - m3inference.dataset - 1 data entries loaded.
Predicting...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00, 1.20s/it]
{'input': {'description': 'No soy ni un troll ni un robot, me tienen harta!!! '
'Digo lo que pienso, nada más que eso. Si no les '
'gusta, no me lean! Podrida de la Korrupción. Quiero '
'Justicia!',
'id': '351160731',
'img_path': '[masked_link]/hermanas_malas_224x224.png',
'lang': 'es',
'name': 'Pía Ferrer 🐱',
'screen_name': 'hermanas_malas'},
'output': {'age': {'19-29': 0.4107,
'30-39': 0.1334,
'<=18': 0.3018,
'>=40': 0.154},
'gender': {'female': 0.8792, 'male': 0.1208},
'org': {'is-org': 0.8338, 'non-org': 0.1662}}}
Hi @zijwang, great improvement of the tool! I try it out and let you know if there is any trouble
Thanks, @joausaga . We appreciate your help in discovering and diagnosing the issue. To be clear, we have updated the preprocessing code; so, images downloaded with the M3Twitter wrapper or preprocessed with scripts/preprocess.py will automatically convert animated GIFs to non-animated PNG/JPEG formats.
If you pass an animated GIF directly to the .infer(...)
method the code will still fail. We're open to possibly checking and reformatting images there, but in general we expect images to have been preprocessed already (e.g., we do not check the dimensions of images in the .infer(...)
method, but expect them to already be properly sized.
I don't think this applies to your use case, but if you use the M3Twitter wrapper to fetch user profile information, it now requires an API key. Details are in README.md.
Oh, good to know. I directly use transform_jsonl_object from m3twitter.py
, which uses download_resize_img. I don't see any change in the definition of theses functions, so I assume I am safe.
Where is the processing of gifs happening?
Yes, no API keys are needed for those functions.
When you call transform_jsonl_object
any of the paths that involve the image being resized call get_extension. If the file extension is ".gif", the function returns ".png" and this new filename is used as the output file and format for any resized/downloaded image. Just looking at this method specifically, I see that this doesn't happen if resize_img=False
, which is something we might want to consider.
I will take a closer look specifically at that preprocessing method (which is one we definitely intend to support) to make sure the conversion is happening.
I have the following error when trying to predict the demographics of a list of twitter users.
The list of users can be found here