Open tydingcw opened 5 months ago
Hey! I ran into the same issue with small peptide sequences for which there was no template hits found. I added a check in data/parsers.py
that checks whether the lists storing different attributes from the templates are empty or not. If they are empty, then I create empty numpy arrays with the correct dimensions as the issue is caused by the np.vstack function that doesn't work on lists. With this fix, I was able to run complex predictions using RF-AA even for very short peptides (even just 3 amino acids).
So, in data/parsers.py
, I modified the function parse_templates_raw
(lines 684 to 690) to include the following:
if len(ids) > 0:
xyz = np.vstack(xyz).astype(np.float32)
mask = np.vstack(mask).astype(bool)
qmap = np.vstack(qmap).astype(np.int64)
f0d = np.vstack(f0d).astype(np.float32)
f1d = np.vstack(f1d).astype(np.float32)
seq = np.hstack(seq).astype(np.int64)
else:
xyz = np.empty((0,3)).astype(np.float32)
mask = np.empty((0)).astype(bool)
qmap = np.empty((0)).astype(np.int64)
f0d = np.empty((0)).astype(np.float32)
f1d = np.empty((0)).astype(np.float32)
seq = np.empty((0)).astype(np.int64)
Hope this helps!
@teemuronkko After modifying the data/parsers.py file, whether the tripeptides were predicted separately or the protein-tripeptide complex structure was predicted, the results showed that the tripeptides were not linked together, but three independent amino acids. Is there any way to improve this function?
Prediction with small fasta sequences causes errors. I think this may be due to not finding a matching template.
Running PSIPRED Running hhsearch Error executing job with overrides: [] Traceback (most recent call last): File "/home/tydingcw/git_repos/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 206, in main runner.infer() File "/home/tydingcw/git_repos/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 153, in infer self.parse_inference_config() File "/home/tydingcw/git_repos/RoseTTAFold-All-Atom/rf2aa/run_inference.py", line 46, in parse_inference_config protein_input = generate_msa_and_load_protein( File "/home/tydingcw/git_repos/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 93, in generate_msa_and_load_protein return load_protein(str(msa_file), str(hhr_file), str(atab_file), model_runner) File "/home/tydingcw/git_repos/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 66, in load_protein xyz_t, t1d, maskt, = get_templates( File "/home/tydingcw/git_repos/RoseTTAFold-All-Atom/rf2aa/data/protein.py", line 30, in get_templates ) = parse_templates_raw(ffdb, hhr_fn=hhr_fn, atab_fn=atab_fn) File "/home/tydingcw/git_repos/RoseTTAFold-All-Atom/rf2aa/data/parsers.py", line 684, in parse_templates_raw xyz = np.vstack(xyz).astype(np.float32) File "/home/tydingcw/mambaforge/envs/RFAA/lib/python3.10/site-packages/numpy/core/shape_base.py", line 289, in vstack return _nx.concatenate(arrs, 0, dtype=dtype, casting=casting) ValueError: need at least one array to concatenate