Open liusfore opened 1 year ago
Hi~ Sorry for the mistake in the data_preprocess.sh
. I accidentally commented the processing logic for SAbDab, which leads to the absence of sabdab_all.json
. Now I've uncommented them. Could you please run the script again to see if the problem is solved? I think it should be fine now.
It seems that bug still happens. The file sabdab_all.json has not been generated.
(dyMEAN) dell@dell-Precision-7920-Tower:/mnt/e/code/dyMEAN$ bash scripts/data_preprocess.sh all_structures/imgt all_data
Locate the project folder at /mnt/e/code/dyMEAN
Processing SAbDab with output directory /mnt/e/code/dyMEAN/all_data
2023-06-16 11:17:59::INFO::Namespace(fout='/mnt/e/code/dyMEAN/all_data/sabdab_all.json', n_cpu=4, numbering='imgt', pdb_dir='/mnt/e/code/dyMEAN/all_structures/imgt', pre_numbered=True, summary='summaries/sabdab_summary.tsv', type='sabdab')
2023-06-16 11:17:59::INFO::download sabdab from summary file summaries/sabdab_summary.tsv
2023-06-16 11:17:59::INFO::Extracting summary to json format
2023-06-16 11:18:00::INFO::Start downloading pdbs in the summary
2023-06-16 11:18:00::INFO::using local PDB files: /mnt/e/code/dyMEAN/all_structures/imgt
2023-06-16 11:18:00::INFO::Assume PDB file already renumbered with scheme imgt
2023-06-16 11:18:00::INFO::downloading raw files
6%|████████▎ | 390/6741 [00:00<00:10, 613.14it/s]6B3M not found in /mnt/e/code/dyMEAN/all_structures/imgt, try fetching from remote server
6TNP not found in /mnt/e/code/dyMEAN/all_structures/imgt, try fetching from remote server
6QXE not found in /mnt/e/code/dyMEAN/all_structures/imgt, try fetching from remote server
5FUU not found in /mnt/e/code/dyMEAN/all_structures/imgt, try fetching from remote server
2023-06-16 11:18:04::WARN::Trying for the 2 times
2023-06-16 11:18:05::WARN::Trying for the 3 times
fetched
6DZT not found in /mnt/e/code/dyMEAN/all_structures/imgt, try fetching from remote server
fetched
2023-06-16 11:18:06::WARN::Trying for the 4 times
fetched
7%|█████████▊ | 457/6741 [00:06<02:46, 37.78it/s]2023-06-16 11:18:07::WARN::Trying for the 5 times
2023-06-16 11:18:08::WARN::Get https://files.rcsb.org/download/5FUU.pdb failed
15%|█████████████████████▍ | 1013/6741 [00:08<00:45, 125.59it/s]
fetched
Traceback (most recent call last):
File "/mnt/data/anaconda/envs/dyMEAN/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/mnt/data/anaconda/envs/dyMEAN/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/mnt/e/code/dyMEAN/data/download.py", line 376, in
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/data/anaconda/envs/dyMEAN/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/mnt/data/anaconda/envs/dyMEAN/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/mnt/e/code/dyMEAN/data/dataset.py", line 323, in
Looks like this is because the pdb of 5FUU is no longer available in the PDB database, which causes error in fetching it from the network. I've add a branch to detect such error. I've tested it, now it should be fine.
(dyMEAN) dell@dell-Precision-7920-Tower:/mnt/e/code/dyMEAN$ bash scripts/data_preprocess.sh all_structures/imgt all_data Locate the project folder at /mnt/e/code/dyMEAN Processing SAbDab with output directory /mnt/e/code/dyMEAN/all_data Processing RAbD with output directory /mnt/e/code/dyMEAN/all_data/RAbD 2023-06-15 15:59:18::INFO::Namespace(fout='/mnt/e/code/dyMEAN/all_data/rabd_all.json', n_cpu=4, numbering='imgt', pdb_dir='/mnt/e/code/dyMEAN/all_structures/imgt', pre_numbered=True, summary='/mnt/e/code/dyMEAN/all_data/sabdab_all.json', type='rabd') 2023-06-15 15:59:18::INFO::download rabd from summary file /mnt/e/code/dyMEAN/all_data/sabdab_all.json 2023-06-15 15:59:18::INFO::Extracting summary to json format Traceback (most recent call last): File "/mnt/data/anaconda/envs/dyMEAN/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/mnt/data/anaconda/envs/dyMEAN/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/mnt/e/code/dyMEAN/data/download.py", line 376, in
main(parse())
File "/mnt/e/code/dyMEAN/data/download.py", line 360, in main
items = read_rabd(fpath)
File "/mnt/e/code/dyMEAN/data/download.py", line 94, in read_rabd
with open(fpath, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/e/code/dyMEAN/all_data/sabdab_all.json'
Traceback (most recent call last):
File "/mnt/data/anaconda/envs/dyMEAN/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/mnt/data/anaconda/envs/dyMEAN/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/mnt/e/code/dyMEAN/data/split.py", line 249, in
main(parse())
File "/mnt/e/code/dyMEAN/data/split.py", line 72, in main
items = load_file(args.data)
File "/mnt/e/code/dyMEAN/data/split.py", line 37, in load_file
with open(fpath, 'r') as fin:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/e/code/dyMEAN/all_data/sabdab_all.json'