cath.superpose ssaps files optimization ?

UCLOrengoGroup / cath-tools

Protein structure comparison tools such as SSAP and SNAP

GNU General Public License v3.0

57 stars 14 forks source link

(base) thibault@XXX [XXX]/ssaps $ ls -l | grep A1A4S6 | grep B1AVH7 -rw-r--r-- 1 thibault ansatt 3080 Aug 21 10:20 A1A4S6.pdbB1AVH7.pdb.list -rw-r--r-- 1 thibault ansatt 62 Aug 21 10:20 A1A4S6.pdbB1AVH7.pdb.scores -rw-r--r-- 1 thibault ansatt 3080 Aug 21 15:37 B1AVH7.pdbA1A4S6.pdb.list -rw-r--r-- 1 thibault ansatt 62 Aug 21 15:37 B1AVH7.pdbA1A4S6.pdb.scores (base) thibault@XXX [XXX]/ssaps $ cat A1A4S6.pdbB1AVH7.pdb.scores A1A4S6.pdb B1AVH7.pdb 108 99 85.49 97 89 15 3.34 (base) thibault@XXX [XXX]/ssaps $ cat B1AVH7.pdbA1A4S6.pdb.scores B1AVH7.pdb A1A4S6.pdb 99 108 85.49 97 89 15 3.34

export CATH_TOOLS_PDB_PATH=$WORKDIR pdbinfile="" for pdb in `ls $WORKDIR/*.pdb |sort -R` do pdbinfile+="--pdb-infile $pdb " done #echo $pdbinfile cath-superpose --do-the-ssaps ssaps --sup-to-pdb-files-dir output $pdbinfile

Thank you for using cath-superpose and for giving us some of your feedback - much appreciated.

I'm not 100% clear about your point about things being sped up by randomising the order of the inputs. Is the point that you're using the --do-the-ssaps option of cath-superpose and you're running several of these at the same time? So you're using the randomisation as a way to parallelise the SSAPs that generate the alignments? In which case, it sounds like it would be valuable to you if there was an option to tell --do-the-ssaps to run n SSAP jobs in parallel. Is that correct?

In general, I think you're right that this area feels like it could be improved. We did enough work in this area to start generating good multiple structural alignments and to build something usable but we think we could do much better on the current trade-off between quality and computation time and on figuring out which SSAPs don't need to be performed.

However, for the issue you're talking about, I think we've already exploited the symmetry of only needing one alignment for each pair of structures: the code only SSAPs+uses the pair in the order of the first-specified-on-the-command-line first. So I suspect what's happening is that your randomisation also randomises the ordering it requires for each pair.

Does that sound right? Does this reinforce the idea that you'd benefit from an in-built way to parallelise the --do-the-ssaps?

UCLOrengoGroup / cath-tools

cath.superpose ssaps files optimization ? #76