ICRAR / ijson

Iterative JSON parser with Pythonic interfaces
http://pypi.python.org/pypi/ijson/
Other
830 stars 51 forks source link

Segmentation fault with yajl2 backend #75

Closed avercau closed 2 years ago

avercau commented 2 years ago

Describe the bug After implementing ijson with the default backend (yajl2) in our training pipeline to deal with huge json files, we started getting random segmentation faults. We never got to the bottom of it, and it was really hard to debug, since it was impossible to trigger the errors, they were really random. Based on this issue and some more issues found w.r.t. yajl2 and segmentation faults, I worked on the hypothesis that it was due to the backend, and switched to use the python backend by default. We have not had this issue ever since. I'm not entirely sure it's related, but the correlation is very strong.

**How to reproduce***

Not possible to reproduce, it's random.

Expected behavior No segmentation faults.

Execution information:

Additional context This is not an issue for me anymore, since the segmentation faults have ceased since switching backends, but it might be worth looking into it.

rtobar commented 2 years ago

@avercau thans for reporting this.

I'm a bit confused: which backend was giving you errors: yajl2 or yajl2_c? They are two different backends, so we should clarify which one was causing issues. And obviously there's also very little I can do without further information.

However, assuming the issue is with the yajl2_c backend, I can venture that you might have been hitting the issue described in https://github.com/ICRAR/ijson/issues/66. That also occurred seemingly randomly, making it hard to reproduce (you can read the full issue to understand the context better). The good news is that I did push a fix for that (see the bug report for that, it's also mentioned in the CHANGELOG), although the original reporter never bothered responding whether the fix worked for them. Could you give that a try? The fix is only on the master branch of this repo, as I haven't released a new version of ijson to PyPI that contains it yet. That would be valuable information.

rtobar commented 2 years ago

@avercau can you please provide some feedback on this? If this is a duplicate I'd like to flag it as such and close the issue.

rtobar commented 2 years ago

Closing due to inactivity, hopefully it's a duplicate of #66.