SpeechColab / GigaSpeech

Large, modern dataset for speech recognition
Apache License 2.0
649 stars 62 forks source link

Incorrect character in GigaSpeech.json #98

Closed dscripka closed 2 years ago

dscripka commented 2 years ago

When using the download_gigaspeech.sh script, I kept running into ijson.common.IncompleteJSONError exceptions.

After debugging, I found that in the current version of the GigaSpeech.json metadata file (at least as of December 2021), there appears to be an incorrect "(" character on line 64006948 which breaks the JSON processing. Here are the surrounding lines:

                     "{XL}"\n                 (  ]\n                },\n                {\n   

Removing this character fixes the exceptions I was getting when using download_gigaspeech.sh.

chenguoguo commented 2 years ago

Thanks! I assigned @chaisz19 to check on the version we have. Will fix that if it is wrong on our version.

chaisz19 commented 2 years ago

We downloaded and checked JSON from three hosts and they were all correct.

dscripka commented 2 years ago

We downloaded and checked JSON from three hosts and they were all correct.

Ah, maybe a local issue for me then. Thanks for confirming.