Closed wt12318 closed 1 year ago
Thanks for reporting this bug, this is definitely not intended behavior. I should have some time on the weekend to fix this.
Hi,
Thanks for updating. I used the up-to-date docker, but the code not change:
import homelette as hm
hm.__version__
##'1.4'
import inspect
with open('hm.txt', 'w', encoding='utf-8') as f:
print(inspect.getsource(hm.alignment), file=f)
f.close()
cat hm.txt | grep "limit"
## self.target = target[:14] # limit length because of hhblits
But the commit history said this changed to self.target = target
:
And the error still exist:
###fasta file:
>tttttttttttttttttttttttttttttttttttttttttt
EVQLVESGGGLVQPGGSLRLSCAASGYTFTDYYIHWVRQAPGKGLEWMAWISPHTGGTIYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCARGPDDWNDGDAFDIWGQGTLVTVSSGGGSGGSGGGSDIQMTQSPSSLSASVGDRVTITCKSSQSVLYSSNNENFLAWYQQKPGKAPKLLIYWASTRESGVPSRFSGSRSGTDFTLTISSLQPEDFATYYCQQYYSTPITFGQGTKVEIK
##error
Traceback (most recent call last):
File "./get_patch_V2.py", line 729, in <module>
t.execute_routine(
File "/usr/local/src/homelette-1.4/homelette/organization.py", line 256, in execute_routine
r.generate_models()
File "/usr/local/src/homelette-1.4/homelette/routines.py", line 944, in generate_models
self.alignment.rename_sequence(self.target, "Target")
File "/usr/local/src/homelette-1.4/homelette/alignment.py", line 575, in rename_sequence
self.sequences[new_name] = self.sequences.pop(old_name)
KeyError: 'tttttttttttttttttttttttttttttttttttttttttt'
It should have changed from
self.target = target
to
self.target = target[:14]
so the code in the docker container seems to be correct.
See here for the current version.
My documentation of the change has been not great though, I will try to improve that. Basically, the issue was related to how hhblits
under the hood limits the name of the sequence to search to 14 characters because of the output format.
Before the fix, when parsing the output file of hhblits
, since the name got changed, the alignment ended up empty because the regex of the longer name would be no-where to be found in the file.
Now, the target name is automatically (and silently) limited to 14 characters on the generation of a AlignmentGenerator
object. This should guarantee that the parsing of the hhblits
output file succeeds every time.
For me, the following code runs flawlessly in the docker container:
import homelette as hm
gen = hm.alignment.AlignmentGenerator_hhblits.from_fasta(
"fasta_file.fa")
gen.get_suggestion(database_dir="hhsuite_dbs/")
gen.show_suggestion()
# select only a few to make output more comprehensive
gen.select_templates(['6VUN_A', '7CU5_A'])
gen.show_suggestion()
gen.alignment.print_clustal(70)
# pull PDB files
gen.get_pdbs()
# initialize task and perform modelling
task = gen.initialize_task()
task.execute_routine(
tag = 'example_modeller',
routine = hm.routines.Routine_automodel_default,
templates = ['6VUN_A'],
template_location = './templates/')
task.models
Regarding your actual error, it seems to be generated from inside Task.execute_routine
. Without the actual code to recreate this, it is kind of difficult to talk about the error, but my guess would be that you initialized the Task
object while still expecting the long name, not the automatically shortened one?
Sorry, it's my fault. I still used the original target name, not the shortened one. Thanks a lot.
No worries, glad it is working now.
Hi,
It seems that the sequence name of fasta file can not exceed 15 character.
The name of first fasta file is 14 character long:
Code can run normal:
But when I change the FASTA sequence name to 15 character length:
Run the same code, this time the cluster is empty and all templates identity is 0:
template | coverage | identity -- | -- | -- 6EJG_C | 0.0 | 0.0 3WBD_B | 0.0 | 0.0 5F3J_C | 0.0 | 0.0 1H8O_B | 0.0 | 0.0 1H8N_A | 0.0 | 0.0 5KVE_L | 0.0 | 0.0 3ESU_F | 0.0 | 0.0 1KTR_L | 0.0 | 0.0 3ESV_G | 0.0 | 0.0 4F9L_C | 0.0 | 0.0