Open StephenTemp opened 2 years ago
It seems as though when I run the script everything works as intended; however, when I try to access via Python, I run into the seg faults. I've just tried running the _test_transformjsonl.py and got the same error.
Thanks @StephenTemp for reporting this. Could you please share the version of Python and version of torch you have installed? Also what operating system are you on?
I would also be curious to know if the scripts/m3twitter.py
file is working for you. E.g.,
python scripts/m3twitter.py --skip-cache --screen-name barackobama --auth scripts/auth.txt
To run that you'll need to create the auth.txt
file in the same format as scripts/auth_example.txt
.
I'd like to isolate whether it's the downloading and processing of images (which happens in both scripts/m3twitter.py
and transform_jsonl
) or if it is something specific to transform_jsonl
.
python version: 3.9.4 torch version: 1.9.0 operating system: macOS Big Sur (11.4 w/ M1)
It seems as though the scripts/m3twitter.py file is indeed working! Here's the output:
Also here's the specific error message:
Thanks, @StephenTemp . I had a typo in Barack (missed the c) and as a result the example we tried didn't include an image. Could you re-run the test with
python scripts/m3twitter.py --skip-cache --screen-name barackobama --auth scripts/auth.txt
Or re-run with any profile that includes a picture. Sorry about that.
No problem!
Thanks, @StephenTemp . One more question is the version of PIL you have
python -c "import PIL; print(PIL.__version__)"
My best guess @zijwang is that this is somewhere in the download or resizing of images given the code runs on a profile with no image but fails on a profile with an image. Given it's a seg fault, it must be in compiled code, which leads me to think it's something with PIL or how we pass the image data to PIL in a BytesIO
wrapper. It might be that we can work around this by loading the image from a file for resizing rather than passing it in memory.
I'll see if I can recreate on my OS (Linux) using the same versions.
PIL: 8.2.0, thanks for your help!
@StephenTemp I did a test run on 1) a valid screen name with a profile, 2) a valid screen name without a profile, and 3) a non-exist screen name with an Ubuntu machine and everything works just fine. Here is my pip list
result:
Package Version
------------------ ---------
certifi 2021.5.30
charset-normalizer 2.0.6
idna 3.2
m3inference 1.1.5
numpy 1.21.2
pandas 1.3.3
Pillow 8.3.2
pip 21.0.1
pycld2 0.41
python-dateutil 2.8.2
pytz 2021.1
rauth 0.7.3
requests 2.26.0
setuptools 58.0.4
six 1.16.0
torch 1.9.0
torchvision 0.10.0
tqdm 4.62.2
typing-extensions 3.10.0.2
urllib3 1.26.6
wheel 0.37.0
I also tried PIL 8.2.0 and things are still working fine.
One thing you may try is to add more log prints to m3inference/m3inference/preprocess.py
and see which exact line throws the segfault. This will provide us more insight on what the root cause is.
@StephenTemp -- checking in to see whether this issue has been resolved :)
operating system: macOS Big Sur (11.4 w/ M1)
Looks like this could be the same issue as #26
It might be specially related to the arm64/M1 chipset, which would explain why we couldn't reproduce it
Apologies for the wait; it's been a cluttered semester! Yes, unfortunately I couldn't work around the _transformjson() function but was able to run the inference itself. It seems that the failure occurs in scripts/m3twitter.py on line
pprint.pprint(m3Twitter.infer_screen_name(args.screen_name, skip_cache=args.skip_cache))
I believe I've installed m3-inference correctly, but running _transformjsonl() on a json lines file of tweets seems to fetch the first profile picture in the list and then terminate with a segmentation fault.
I believe the file is structured appropriately, in the format below: {json object}\n {json object}\n ...
Any idea what I might be running into?