hi,
when I run make WSJ_DIR=/hy-tmp/wsj SMS_WSJ_DIR=/hy-tmp/sms_wsj, I got a " Argument list too long " error, I looked into the source code and do some debug. It seems that dirty.txt is too large for sh.perl which called by normalize_transcription in create_json.py. The max length for sh.perl is 128K, but dirty.txt is over 10M long. How could I fix it ??
by the way, the linux kernel version is 5.4.0-146-generic.
Here are the whole terminal outputs
////////////////// Teminal outputs ////////////////////////////////
creating /hy-tmp/sms_wsj/wsj_8k_zeromean.json
python -m sms_wsj.database.wsj.create_json \
with json_path=/hy-tmp/sms_wsj/wsj_8k_zeromean.json database_dir=/hy-tmp/sms_wsj/wsj_8k_zeromean as_wav=True
WARNING - Create wsj json - No observers have been added to this run
INFO - Create wsj json - Running command 'create_database'
INFO - Create wsj json - Started
INFO - sh.command - <Command '/usr/bin/cat /tmp/tmprgsph81i/dirty.txt', pid 1008>: process started
ERROR - Create wsj json - Failed after 0:00:11!
Traceback (most recent calls WITHOUT Sacred internals):
File "/root/sms_wsj/sms_wsj/database/wsj/create_json.py", line 293, in create_database
transcriptions = get_transcriptions(database_dir, database_dir)
File "/root/sms_wsj/sms_wsj/database/wsj/create_json.py", line 170, in get_transcriptions
data_dict["clean word"] = normalize_transcription(word, wsj_root)
File "/root/sms_wsj/sms_wsj/database/wsj/create_json.py", line 192, in normalize_transcription
result = sh.perl(
File "/root/.local/lib/python3.8/site-packages/sh.py", line 1508, in call
rc = self.class.RunningCommandCls(cmd, call_args, stdin, stdout, stderr)
File "/root/.local/lib/python3.8/site-packages/sh.py", line 720, in init
self.process = OProc(
File "/root/.local/lib/python3.8/site-packages/sh.py", line 2157, in init
raise ForkException(fork_exc)
sh.ForkException:
Original exception:
Traceback (most recent call last):
File "/root/.local/lib/python3.8/site-packages/sh.py", line 2110, in __init__
os.execv(bytes_cmd[0], bytes_cmd)
OSError: [Errno 7] Argument list too long
hi, when I run make WSJ_DIR=/hy-tmp/wsj SMS_WSJ_DIR=/hy-tmp/sms_wsj, I got a " Argument list too long " error, I looked into the source code and do some debug. It seems that dirty.txt is too large for sh.perl which called by normalize_transcription in create_json.py. The max length for sh.perl is 128K, but dirty.txt is over 10M long. How could I fix it ?? by the way, the linux kernel version is 5.4.0-146-generic.
Here are the whole terminal outputs ////////////////// Teminal outputs //////////////////////////////// creating /hy-tmp/sms_wsj/wsj_8k_zeromean.json python -m sms_wsj.database.wsj.create_json \ with json_path=/hy-tmp/sms_wsj/wsj_8k_zeromean.json database_dir=/hy-tmp/sms_wsj/wsj_8k_zeromean as_wav=True WARNING - Create wsj json - No observers have been added to this run INFO - Create wsj json - Running command 'create_database' INFO - Create wsj json - Started INFO - sh.command - <Command '/usr/bin/cat /tmp/tmprgsph81i/dirty.txt', pid 1008>: process started ERROR - Create wsj json - Failed after 0:00:11! Traceback (most recent calls WITHOUT Sacred internals): File "/root/sms_wsj/sms_wsj/database/wsj/create_json.py", line 293, in create_database transcriptions = get_transcriptions(database_dir, database_dir) File "/root/sms_wsj/sms_wsj/database/wsj/create_json.py", line 170, in get_transcriptions data_dict["clean word"] = normalize_transcription(word, wsj_root) File "/root/sms_wsj/sms_wsj/database/wsj/create_json.py", line 192, in normalize_transcription result = sh.perl( File "/root/.local/lib/python3.8/site-packages/sh.py", line 1508, in call rc = self.class.RunningCommandCls(cmd, call_args, stdin, stdout, stderr) File "/root/.local/lib/python3.8/site-packages/sh.py", line 720, in init self.process = OProc( File "/root/.local/lib/python3.8/site-packages/sh.py", line 2157, in init raise ForkException(fork_exc) sh.ForkException:
Original exception: