Closed tkosciol closed 6 years ago
similar situation for: /projects/microprot/results/domain_benchmark/pkg/03-Pfam/T0836_1-204.out
I would expect "36-199" to be a match, while it's "84-199"
hm, I have a hard time to reproduce this error. @tkosciol could you try to fill the missing parts in https://github.com/sjanssen2/microprot/tree/fix-split such that the unit test would produce the wrong results as described above?
Sure thing! I will try to reproduce this error and get back to you ASAP. Likely on Friday, though. I’ve got a full day tomorrow.
On Dec 13, 2017, 19:35 +0100, Stefan Janssen notifications@github.com, wrote:
hm, I have a hard time to reproduce this error. @tkosciolhttps://github.com/tkosciol could you try to fill the missing parts in https://github.com/sjanssen2/microprot/tree/fix-split such that the unit test would produce the wrong results as described above?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/biocore/microprot/issues/71#issuecomment-351481624, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGZ0VtoIBhKuPGKbbAcQh8PTuPaIhppHks5tABjZgaJpZM4Q4z70.
from @tkosciol: Running on Barnacle:
/projects/microprot/results/domain_benchmark/PDB$python ../../../microprot/scripts/split_search.py -p 95 -l 40 T0831.out ../pkg/01-PDB/T0831.fasta
match
>T0831_1-80 # 4QN1_A HPRH, Homo sapiens, 419 residues
TMEELLTSLQKKCGTECEEAHRQLVCALNGLAGIHIIKGEYALAAELYREVLRSSEEHKGKLKTDSLQRLHATHNLMELL
non_match
>T0831_81-419 HPRH, Homo sapiens, 419 residues
IARHPGIPPTLRDGRLEEEAKQLREHYMSKCNTEVAEAQQALYPVQQTIHELQRKIHSNSPWWLNVIHRAIEFTIDEELVQRVRNEITSNYKQQTGKLSMSEKFRDCRGLQFLLTTQMEELNKCQKLVREAVKNLEGPPSRNVIESATVCHLRPARLPLNCCVFCKADELFTEYESKLFSNTVKGQTAIFEEMIEDEEGLVDDRAPTTTRGLWAISETERSMKAILSFAKSHRFDVEFVDEGSTSMDLFEAWKKEYKLLHEYWMALRNRVSAVDELAMATERLRVRDPREPKPNPPVLHIIEPHEVEQNRIKLLNDKAVATSQLQKKLGQLLYLTNLEK
Which is the wrong result.
I also attach match
file:
>T0831_1-80 # 4QN1_A SHPRH, Homo sapiens, 419 residues
TMEELLTSLQKKCGTECEEAHRQLVCALNGLAGIHIIKGEYALAAELYREVLRSSEEHKGKLKTDSLQRLHATHNLMELL
While benchmarking domain splitting between different approaches, I noticed that
split_search
script does not do some splits correctly.Example on Barnacle:
/projects/microprot/results/domain_benchmark/pkg/01-PDB/T0831.{out,match,non_match}
HHSearch clearly hits 1 PDB which should be the hit (residues "1-419" in target). However, for an unknown reason, the script decided tomatch
residues "1-80" (i.e. only the first line of alignment in out) and leave "81-419" as anon_match
. The problem then continues in../02-CM
, where again we hit a single PDB with 100% probability, but the method only assignsmatch
to residues "163-242" (i.e. only the second line*) and the rest isnon_match
Parameters used:
or view the entire config on Barnacle in:
/projects/microprot/results/domain_benchmark/config.yml
The fix is not very time-sensitive, but it would be nice to have before the end of the year.