Open loretoparisi opened 6 years ago
Hello, I get this error when running the preprocess_releases_json_to_hdf_pandas.py
preprocess_releases_json_to_hdf_pandas.py
Loading json dump into a pandas DataFrame Processed 500000 releases Processed 1000000 releases Processed 1500000 releases Processed 2000000 releases Processed 2500000 releases Processed 3000000 releases Processed 3500000 releases Processed 4000000 releases Processed 4500000 releases Processed 5000000 releases Processed 5500000 releases Processed 6000000 releases Processed 6500000 releases Processed 7000000 releases Processed 7500000 releases Processed 8000000 releases Processed 8500000 releases Processed 9000000 releases --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-1-4df24afa67c8> in <module>() ----> 1 from preprocess_releases_json_to_hdf_pandas.py import * /Users/loretoparisi/Documents/Projects/AI/ismir2017-discogs/code/preprocess_releases_json_to_hdf_pandas.py in <module>() 134 else: 135 print("Loading json dump into a pandas DataFrame") --> 136 data = load_releases(ignore_genres=IGNORE_GENRES, part=100) 137 print("Saving DataFrame to %s" % dump_pandas) 138 data.to_hdf(dump_pandas, 'w') /Users/loretoparisi/Documents/Projects/AI/ismir2017-discogs/code/preprocess_releases_json_to_hdf_pandas.py in load_releases(size, part, ignore_genres) 69 if not i % (100/part): 70 ---> 71 release = json.loads(jsonline) 72 73 # remove some columns that we won't use to save memory /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.pyc in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw) 336 parse_int is None and parse_float is None and 337 parse_constant is None and object_pairs_hook is None and not kw): --> 338 return _default_decoder.decode(s) 339 if cls is None: 340 cls = JSONDecoder /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.pyc in decode(self, s, _w) 364 365 """ --> 366 obj, end = self.raw_decode(s, idx=_w(s, 0).end()) 367 end = _w(s, end).end() 368 if end != len(s): /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.pyc in raw_decode(self, s, idx) 380 """ 381 try: --> 382 obj, end = self.scan_once(s, idx) 383 except StopIteration: 384 raise ValueError("No JSON object could be decoded") ValueError: Unterminated string starting at: line 1 column 1175 (char 1174)
I have updated the data to 2018 releases here https://github.com/loretoparisi/ismir2017-discogs/blob/master/code/config.py Everything worked properly, so in my data/ folder I have
data/
ip-192-168-22-127:discogs loretoparisi$ tree -L 1 -h . ├── [239M] discogs_20180101_artists.xml.gz ├── [ 39M] discogs_20180101_labels.xml.gz ├── [152M] discogs_20180101_masters.xml.gz ├── [9.0G] discogs_20180101_releases.json.dump └── [5.1G] discogs_20180101_releases.xml.gz 0 directories, 5 files
Hi @loretoparisi, I'll have a look and try this new dump next week.
Hello, I get this error when running the
preprocess_releases_json_to_hdf_pandas.py
I have updated the data to 2018 releases here https://github.com/loretoparisi/ismir2017-discogs/blob/master/code/config.py Everything worked properly, so in my
data/
folder I have