bbolli / tumblr-utils

Utilities for dealing with Tumblr blogs, Tumblr backup
GNU General Public License v3.0
667 stars 124 forks source link

JSON: ValueError: Unterminated string #208

Closed indrakaw closed 5 years ago

indrakaw commented 5 years ago
ValueError: Unterminated string starting at: line 1 column 188375 (char 188374)
200 OK application/json

then following with JSON dump on console.

This is the args I used:

tumblr_backup.py -j -I i --save-video-tumblr --save-audio -p 201309 -O nordeajo-201309 nordeajo

Quick question: Is this affect backed up POST, MEDIA, and JSON?

cebtenzzre commented 5 years ago

This will not affect any backed up data. If apiparse catches a ValueError, it returns None, which either prevents the blog from being backed up if it happens on the first post, or causes the problematic range of up to 50 posts to be skipped. If/when this issue is resolved, and it did not occur while parsing the first post, you can re-run tumblr_backup.py in non-incremental mode on at least the affected range of posts to make sure they are properly backed up.

Could you provide the JSON that cannot be parsed, assuming this is repeatable? Ideally it would be saved in untouched binary form rather than copied from potentially re-encoded console output. If you would like a patch to do this automatically, I could provide one. In fact, that would be a generally useful feature that I could make a PR for.

There are four potential explanations for this error: 1) The JSON parser used by tumblr-utils is not sufficiently compliant. There are a few benchmarks for this, but rapidjson is one of the best all around. 2) The JSON parser used by tumblr-utils is not sufficiently permissive, and Tumblr is sending malformed or non-standard JSON. 3) The end-quote of the offending string was not parsed in the intended way, which could be the result of an improperly escaped backslash or encoding issue. 4) The JSON has been truncated such that it ends in the middle of a string. Truncation could be the result of the GET terminating prematurely, invalid characters (such as a stray NUL or encoding issue), or a length limit somewhere.

indrakaw commented 5 years ago

This is quite takes time. The blog has 190k posts. The output console kinda big and fill the console. I worried that I wouldn't catch up the head--only the last lines. Because it's big, I will attach it later as attachment file on this issue.

Any suggestion actions? Should I tee the command?

indrakaw commented 5 years ago

Haven't you done the args command I posted above? It's a month period, and only take less than a GiB download. I recommend you to do it and test it yourself.

indrakaw commented 5 years ago

It takes an hour and haven't done yet. Here the snip. I can't paste persistently because I had to copy-paste directly from console to text editor. unfinishedconsolelog1.txt

indrakaw commented 5 years ago

Oh no, it's happening when I run tumblr_backup.py -k cleavageilike too. Help!

indrakaw commented 5 years ago

Here the list version of python2 modules:

$ pip freeze
DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7.
absl-py==0.7.0
astor==0.7.1
attrs==19.1.0
autopep8==1.4.3
backports.shutil-get-terminal-size==1.0.0
backports.weakref==1.0.post1
cachetools==3.1.0
certifi==2018.11.29
chardet==3.0.4
configparser==3.7.3
crcmod==1.7
decorator==4.3.2
enum34==1.1.6
funcsigs==1.0.2
functools32==3.2.3.post2
future==0.17.1
futures==3.2.0
gast==0.2.2
google-api-core==1.8.0
google-api-python-client==1.7.8
google-auth==1.6.3
google-auth-httplib2==0.0.3
google-cloud-bigquery==1.9.0
google-cloud-core==0.29.1
google-cloud-datastore==1.7.3
google-cloud-language==1.1.1
google-cloud-logging==1.10.0
google-cloud-spanner==1.7.1
google-cloud-storage==1.14.0
google-cloud-translate==1.3.3
google-cloud-videointelligence==1.7.0
google-cloud-vision==0.36.0
google-resumable-media==0.3.2
googleapis-common-protos==1.5.8
grpc-google-iam-v1==0.11.4
grpcio==1.19.0
h5py==2.9.0
httplib2==0.12.1
idna==2.8
ipaddr==2.2.0
ipython==5.8.0
ipython-genutils==0.2.0
jedi==0.13.3
jsonschema==3.0.1
Keras-Applications==1.0.7
Keras-Preprocessing==1.0.9
Markdown==3.0.1
mccabe==0.6.1
meld3==1.0.2
mercurial==4.0
mock==2.0.0
numpy==1.16.2
oauth2==1.9.0.post1
oauth2client==4.1.3
parso==0.3.4
pathlib2==2.3.3
pbr==5.1.3
pexpect==4.6.0
pickleshare==0.7.5
pluggy==0.9.0
prompt-toolkit==1.0.15
protobuf==3.7.0
ptyprocess==0.6.0
pyasn1==0.4.5
pyasn1-modules==0.2.4
pycodestyle==2.5.0
pydocstyle==3.0.0
pyflakes==2.1.1
Pygments==2.3.1
pyrsistent==0.14.11
python-jsonrpc-server==0.1.2
python-language-server==0.24.0
pytz==2018.9
requests==2.21.0
rope==0.12.0
rsa==4.0
scandir==1.9.0
simplegeneric==0.8.1
six==1.12.0
snowballstemmer==1.2.1
supervisor==3.3.1
tensorboard==1.13.0
tensorflow==1.13.1
tensorflow-estimator==1.13.0
termcolor==1.1.0
traitlets==4.3.2
uritemplate==3.0.0
urllib3==1.24.1
virtualenv==16.4.3
wcwidth==0.1.7
Werkzeug==0.14.1
yapf==0.26.0
youtube-dl==2019.1.2
cebtenzzre commented 5 years ago

I haven't been able to reproduce this issue, so I'll assume the JSON response is getting silently truncated on your machine. In that case, tumblr-utils is working exactly as it should. Just ignore the errors until you can fix the underlying issue, and don't expect to have complete backups.