No transcriptions to clean up

zx1292982431 commented 11 months ago

Hello! When I execute $ make WSJ_DIR=/path/to/wsj SMS_WSJ_DIR=/path/to/write/db/to，return me a AssertionError: No transcriptions to clean up. error from sms_wsj/database/wsj/create_json.py. How to fix it?

boeddeker commented 11 months ago

Hello, could you provide the command that you executed and the full terminal output?

Have you changed /path/to/wsj to the path of WSJ on your system?

zx1292982431 commented 10 months ago

Sorry for late reply! After setting the Kaldi path, I use make WSJ_DIR=/data/lzx/wsj0 SMS_WSJ_DIR=/data/lzx/Datasets/SMS_WSJ to generate sms_wsj dataset, but I got a error:

creating /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean.json
python -m sms_wsj.database.wsj.create_json \
with json_path=/data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean.json database_dir=/data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean as_wav=True
WARNING - Create wsj json - No observers have been added to this run
INFO - Create wsj json - Running command 'create_database'
INFO - Create wsj json - Started
ERROR - Create wsj json - Failed after 0:00:00!
Traceback (most recent calls WITHOUT Sacred internals):
  File "/data/lzx/SpatialNet/sms_wsj/sms_wsj/database/wsj/create_json.py", line 293, in create_database
    transcriptions = get_transcriptions(database_dir, database_dir)
  File "/data/lzx/SpatialNet/sms_wsj/sms_wsj/database/wsj/create_json.py", line 170, in get_transcriptions
    data_dict["clean word"] = normalize_transcription(word, wsj_root)
  File "/data/lzx/SpatialNet/sms_wsj/sms_wsj/database/wsj/create_json.py", line 186, in normalize_transcription
    assert len(transcriptions) > 0, 'No transcriptions to clean up.'
AssertionError: No transcriptions to clean up.

make: *** [Makefile:32: /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean.json] Error 1

May I ask if you have encountered the similar problem and how to fix it?

boeddeker commented 10 months ago

This error means, the code was not able to find the transcriptions. I guess, the code was not able to find the *.dot and *.pth files in /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean.

Could you execute the following commands and report the output:

find /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean -iname "*.dot" | wc -l
find /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean -iname "*.ptx" | wc -l
find /data/lzx/Datasets/SMS_WSJ/wsj_8k_zeromean -iname "*.wav" | wc -l

I got the following output:

/net/db/sms_wsj/wsj_8k_zeromean$ find . -iname "*.dot" | wc -l
3585
/net/db/sms_wsj/wsj_8k_zeromean$ find . -iname "*.ptx" | wc -l
3547
/net/db/sms_wsj/wsj_8k_zeromean$ find . -iname "*.wav" | wc -l
129106

Maybe something went wrong, when creating the wsj_8k_zeromean folder.

I guess the /data/lzx/wsj0 folder contains only the WSJ0 files. If that is correct, you have to delete all generated files and change the call to specify the WSJ0 and WSJ1 folder, e.g. make WSJ0_DIR=/data/lzx/wsj0 WSJ1_DIR=/data/lzx/wsj1 SMS_WSJ_DIR=/data/lzx/Datasets/SMS_WSJ (I assumed, the WSJ1 files are in /data/lzx/wsj1).

fgnt / sms_wsj

No transcriptions to clean up #28