Description:
When building the dataset for pdbbind structures, whenever the dataset has any of these 3 complexes below, it will give this error:
Fatal: Cannot read molecule
Upon checking, the ligand and protein is parsed successfully for each of these complexes. In asapdiscovery-ml/schema_v2/config.py: Complex.from_pdb was able to load the protein and ligand as input data. But somehow later this error occurred.
This issue only happened to 3/4606 of the pdbbind complex.pdb structures: 5vh0, 6eiz, 6a87. Excluding them from the schema and structure list solved the problem. But it would be interesting to look into the reason why they are causing the error.
PS: It was a big hassle to locate exactly these 3 problematic structures out of ~5000 structures. The ligand and protein would appear to be read successfully, and the error would happen at a very late stage. Originally thought it was a systematic error with "build-dataset" script, but when I tried using 200 structures it completed without error. So at the end I increased from 300, 400, 500 …, to 5000 structures manually to figure out the 3 structures that were actually causing this issue.
Command that I ran:
unknown_error.json error_complexes.zip unzip to get the three complex.pdb files.
Description: When building the dataset for pdbbind structures, whenever the dataset has any of these 3 complexes below, it will give this error:
Fatal: Cannot read molecule
Upon checking, the ligand and protein is parsed successfully for each of these complexes. In
asapdiscovery-ml/schema_v2/config.py
:Complex.from_pdb
was able to load the protein and ligand as input data. But somehow later this error occurred.This issue only happened to 3/4606 of the pdbbind
complex.pdb
structures: 5vh0, 6eiz, 6a87. Excluding them from the schema and structure list solved the problem. But it would be interesting to look into the reason why they are causing the error.PS: It was a big hassle to locate exactly these 3 problematic structures out of ~5000 structures. The ligand and protein would appear to be read successfully, and the error would happen at a very late stage. Originally thought it was a systematic error with "build-dataset" script, but when I tried using 200 structures it completed without error. So at the end I increased from 300, 400, 500 …, to 5000 structures manually to figure out the 3 structures that were actually causing this issue.