awfderry / COLLAPSE

Representation learning for protein functional site analysis
MIT License
8 stars 2 forks source link

Accepted formats include pdb, pdb.gz, and cif? #10

Closed BinhongLiu closed 1 year ago

BinhongLiu commented 1 year ago

Hi. I tested the Embed entire dataset of PDB files script (https://github.com/awfderry/COLLAPSE#embed-entire-dataset-of-pdb-files) with some .cif structures, but it seemed to be that the .cif format was not accepted.

the log is here: (COLLAPSE) [ac1daawz21@login03 COLLAPSE-main]$ python embed_pdb_dataset.py /work/home/ac1daawz21/test/S1/1/ /work/home/ac1daawz21/test/S1/ --filetype cif Traceback (most recent call last): File "embed_pdb_dataset.py", line 23, in dataset = load_dataset(args.data_dir, args.filetype, transform=transform) File "/work/home/ac1daawz21/miniconda3/envs/COLLAPSE/lib/python3.8/site-packages/atom3d/datasets/datasets.py", line 441, in load_dataset raise RuntimeError(f'Unrecognized filetype {filetype}.') RuntimeError: Unrecognized filetype cif.

awfderry commented 1 year ago

Sorry for the inconvenience, this appears to be due to the underlying ATOM3D function. There are two options to fix this for now: (1) pull the latest version of ATOM3D from here, or (2) convert your dataset to PDB format.

BinhongLiu commented 1 year ago

Yes! The latest version of ATOM3D supported the .cif format. Thanks!