FC speed improvements - Githubissues

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Running especially large (total number of residues) entries. 

Below are the timings on my Macbook Pro. Tang is almost as fast.
The first time is for a max of 7500 residues of entry 2k0e and the FC takes 19 
out of the total of 
23 minutes processing time. This is including writing a STAR file.
The second time is for just one model of the same entry and the FC takes 24 
seconds out of the 
total of 114.

If all entries were like 2k0e (in reality they are much smaller and thus 
faster) it would take 52 
days just for the FC.

The above needed to cut the number of models from 160 to 49 in order to keep 
the total number 
of residues to below 7500. It seems that in some cases we would like to have 
more than 7500 
residues.

If the noWrite option is used in the FC the time spend is reduced from 23 
minutes to less than 2 
minutes. So it seems to be the culprit. Wim could you try to tweak that part?

jd:Stella/~/ time $scripts_dir/processDOCR_FRED.csh 2k0e

interactiveProcessing       interactive run is fast use zero for production     
       0
doReadMmCif       Converts PDB mmCIF to NMR-STAR with Wattos        -> 
XXXX_wattos.str 1
doJoin            Joins the parsed NMR-STAR rest with coor. Wattos    -> 
XXXX_join.str 1
doMerge           Converts STAR to STAR with linkNmrStarData.py      -> 
XXXX_merge.str 1
doAssign          Changes stereo assignments with Wattos            -> 
XXXX_assign.str 1
doSurplus         Changes distance restraints with Wattos     ->   
XXXX_nonsurplus.str 1
doViolAnal        Analyzes violation with Wattos                                
       1
doCompleteness    Determines NOE completeness with Wattos                       
       1
doExportsForGrid  Converts DOCR/FRED to CYANA and XPLOR for Grid                
       1
doOrganizeForGrid Puts the results in a directory structure for Grid            
       1
doDumpInGrid      Puts the files into Grid                                      
       1
doCleanFiles      Removes redundant files from fs                               
       0
Is there an X-server attached for possible questions:                           
       0
Extra arguments FC                                                              
       -raise -force -noGui
PYTHONPATH:       
/Users/jd/workspace34/cing/python:/Users/jd/workspace34/cing/dist/Cython:/Users/
jd/works
pace34/wattos/python:/Users/jd/workspace34/recoord/python:/Users/jd/workspace34/
nmrres
trntsgrid/python:/Users/jd/workspace34/ccpn/python
CLASSPATH:        
/sw/share/java/antlr/antlr.jar:/sw/share/java/junit/junit.jar:/Users/jd/workspac
e34/wattos/buil
d/web/WEB-
INF/classes:/Users/jd/workspace34/wattos//build/test/classes:/Users/jd/workspace
34/wattos/l
ib/ant-
contrib.jar:/Users/jd/workspace34/wattos/lib/colt.jar:/Users/jd/workspace34/watt
os/lib/CSVuti
ls.jar:/Users/jd/workspace34/wattos/lib/jakarta-
regexp.jar:/Users/jd/workspace34/wattos/lib/javacc.jar:/Users/jd/workspace34/wat
tos/lib/JFle
x.jar:/Users/jd/workspace34/wattos/lib/printf_hb15.jar:/Users/jd/workspace34/wat
tos/lib/mys
ql-connector-java-5.0.3-
bin.jar:/Users/jd/workspace34/wattos/lib/starlibj_with_source.jar:/Users/jd/work
space34/watto
s/lib/swing-layout-1.0.jar:/Users/jd/workspace34/wattos/lib/jfreechart-
1.0.1.jar:/Users/jd/workspace34/wattos/lib/jcommon-
1.0.0.jar:/Users/jd/workspace34/wattos/lib/gnujaxp.jar:/Users/jd/workspace34/wat
tos/lib/itex
t-1.4.jar:/Users/jd/workspace34/wattos/lib/junit-3.8.1.jar
wattos:      aliased to java -Djava.awt.headless=true -Xmx2g 
Wattos.CloneWars.UserInterface -at
Doing 1 pdb entries
Doing 2k0e
  mmCIF
DEBUG: Tue Aug 19 17:23:41 CEST 2008
DEBUG: Tue Aug 19 17:24:26 CEST 2008
  join
DEBUG: Tue Aug 19 17:24:26 CEST 2008
DEBUG: Tue Aug 19 17:24:28 CEST 2008
  merge
SEQRES derived records
A   1  ALA    2  ASP    3  GLN    4  LEU    5  THR    6  GLU    7  GLU    8  
GLN    9  ILE   10  ALA 
A  11  GLU   12  PHE   13  LYS   14  GLU   15  ALA   16  PHE   17  SER   18  
LEU   19  PHE   20  ASP 
A  21  LYS   22  ASP   23  GLY   24  ASP   25  GLY   26  THR   27  ILE   28  
THR   29  THR   30  LYS 
A  31  GLU   32  LEU   33  GLY   34  THR   35  VAL   36  MET   37  ARG   38  
SER   39  LEU   40  GLY 
A  41  GLN   42  ASN   43  PRO   44  THR   45  GLU   46  ALA   47  GLU   48  
LEU   49  GLN   50  ASP 
A  51  MET   52  ILE   53  ASN   54  GLU   55  VAL   56  ASP   57  ALA   58  
ASP   59  GLY   60  ASN 
A  61  GLY   62  THR   63  ILE   64  ASP   65  PHE   66  PRO   67  GLU   68  
PHE   69  LEU   70  THR 
A  71  MET   72  MET   73  ALA   74  ARG   75  LYS   76  MET   77  LYS   78  
ASP   79  THR   80  ASP 
A  81  SER   82  GLU   83  GLU   84  GLU   85  ILE   86  ARG   87  GLU   88  
ALA   89  PHE   90  ARG 
A  91  VAL   92  PHE   93  ASP   94  LYS   95  ASP   96  GLY   97  ASN   98  
GLY   99  TYR  100  ILE 
A 101  SER  102  ALA  103  ALA  104  GLU  105  LEU  106  ARG  107  HIS  108  
VAL  109  MET  110  
THR 
A 111  ASN  112  LEU  113  GLY  114  GLU  115  LYS  116  LEU  117  THR  118  
ASP  119  GLU  120  
GLU 
A 121  VAL  122  ASP  123  GLU  124  MET  125  ILE  126  ARG  127  GLU  128  
ALA  129  ASP  130  
ILE 
A 131  ASP  132  GLY  133  ASP  134  GLY  135  GLN  136  VAL  137  ASN  138  
TYR  139  GLU  140  
GLU 
A 141  PHE  142  VAL  143  GLN  144  MET  145  MET  146  THR  147  ALA  148  LYS

Restraint ranges
Rst:    .   1   .   HA - 148   .   HA diff 147 ch.range 148
Rst:    . 181   .   MN - 184   .   MN diff   3 ch.range 152
Rst:    . 201   .   HA - 223   .   HA diff  22 ch.range 175
Rst:    . 225   .   HA - 226   .   HA diff   1 ch.range 177

Triplet matches
Restraint    SEQRS Offset
Start guessing.
DEBUG: Tue Aug 19 17:24:33 CEST 2008
DEBUG: Tue Aug 19 17:42:52 CEST 2008
DEBUG: Tue Aug 19 17:42:52 CEST 2008
DEBUG: Tue Aug 19 17:43:02 CEST 2008
  assign
DEBUG: Tue Aug 19 17:43:03 CEST 2008
DEBUG: Tue Aug 19 17:43:17 CEST 2008
  surplus
DEBUG: Tue Aug 19 17:43:23 CEST 2008
DEBUG: Tue Aug 19 17:43:41 CEST 2008
  viol
DEBUG: Tue Aug 19 17:43:41 CEST 2008
DEBUG: Tue Aug 19 17:43:48 CEST 2008
  compl
DEBUG: Tue Aug 19 17:43:48 CEST 2008
DEBUG: Tue Aug 19 17:45:24 CEST 2008
  exportsForGrid
DEBUG: Tue Aug 19 17:45:24 CEST 2008
DEBUG: Tue Aug 19 17:45:38 CEST 2008
  gridOrganize
DEBUG: Tue Aug 19 17:45:38 CEST 2008
  gridDump
DEBUG: Tue Aug 19 17:45:56 CEST 2008
DEBUG: Tue Aug 19 17:46:47 CEST 2008
  Finished 2k0e
1243.879u 115.007s 23:06.04 98.0%   0+0k 21+478io 397pf+0w
jd:Stella/~/ time $scripts_dir/processDOCR_FRED.csh 2k0e

interactiveProcessing       interactive run is fast use zero for production     
       1
doReadMmCif       Converts PDB mmCIF to NMR-STAR with Wattos        -> 
XXXX_wattos.str 1
doJoin            Joins the parsed NMR-STAR rest with coor. Wattos    -> 
XXXX_join.str 1
doMerge           Converts STAR to STAR with linkNmrStarData.py      -> 
XXXX_merge.str 1
doAssign          Changes stereo assignments with Wattos            -> 
XXXX_assign.str 1
doSurplus         Changes distance restraints with Wattos     ->   
XXXX_nonsurplus.str 1
doViolAnal        Analyzes violation with Wattos                                
       1
doCompleteness    Determines NOE completeness with Wattos                       
       1
doExportsForGrid  Converts DOCR/FRED to CYANA and XPLOR for Grid                
       1
doOrganizeForGrid Puts the results in a directory structure for Grid            
       1
doDumpInGrid      Puts the files into Grid                                      
       1
doCleanFiles      Removes redundant files from fs                               
       0
Is there an X-server attached for possible questions:                           
       0
Extra arguments FC                                                              
       -raise -force -noGui
PYTHONPATH:       
/Users/jd/workspace34/cing/python:/Users/jd/workspace34/cing/dist/Cython:/Users/
jd/works
pace34/wattos/python:/Users/jd/workspace34/recoord/python:/Users/jd/workspace34/
nmrres
trntsgrid/python:/Users/jd/workspace34/ccpn/python
CLASSPATH:        
/sw/share/java/antlr/antlr.jar:/sw/share/java/junit/junit.jar:/Users/jd/workspac
e34/wattos/buil
d/web/WEB-
INF/classes:/Users/jd/workspace34/wattos//build/test/classes:/Users/jd/workspace
34/wattos/l
ib/ant-
contrib.jar:/Users/jd/workspace34/wattos/lib/colt.jar:/Users/jd/workspace34/watt
os/lib/CSVuti
ls.jar:/Users/jd/workspace34/wattos/lib/jakarta-
regexp.jar:/Users/jd/workspace34/wattos/lib/javacc.jar:/Users/jd/workspace34/wat
tos/lib/JFle
x.jar:/Users/jd/workspace34/wattos/lib/printf_hb15.jar:/Users/jd/workspace34/wat
tos/lib/mys
ql-connector-java-5.0.3-
bin.jar:/Users/jd/workspace34/wattos/lib/starlibj_with_source.jar:/Users/jd/work
space34/watto
s/lib/swing-layout-1.0.jar:/Users/jd/workspace34/wattos/lib/jfreechart-
1.0.1.jar:/Users/jd/workspace34/wattos/lib/jcommon-
1.0.0.jar:/Users/jd/workspace34/wattos/lib/gnujaxp.jar:/Users/jd/workspace34/wat
tos/lib/itex
t-1.4.jar:/Users/jd/workspace34/wattos/lib/junit-3.8.1.jar
wattos:      aliased to java -Djava.awt.headless=true -Xmx2g 
Wattos.CloneWars.UserInterface -at
Doing 1 pdb entries
Doing 2k0e
  mmCIF
DEBUG: Tue Aug 19 18:12:32 CEST 2008
DEBUG: Tue Aug 19 18:13:15 CEST 2008
  join
DEBUG: Tue Aug 19 18:13:15 CEST 2008
DEBUG: Tue Aug 19 18:13:15 CEST 2008
  merge
SEQRES derived records
A   1  ALA    2  ASP    3  GLN    4  LEU    5  THR    6  GLU    7  GLU    8  
GLN    9  ILE   10  ALA 
A  11  GLU   12  PHE   13  LYS   14  GLU   15  ALA   16  PHE   17  SER   18  
LEU   19  PHE   20  ASP 
A  21  LYS   22  ASP   23  GLY   24  ASP   25  GLY   26  THR   27  ILE   28  
THR   29  THR   30  LYS 
A  31  GLU   32  LEU   33  GLY   34  THR   35  VAL   36  MET   37  ARG   38  
SER   39  LEU   40  GLY 
A  41  GLN   42  ASN   43  PRO   44  THR   45  GLU   46  ALA   47  GLU   48  
LEU   49  GLN   50  ASP 
A  51  MET   52  ILE   53  ASN   54  GLU   55  VAL   56  ASP   57  ALA   58  
ASP   59  GLY   60  ASN 
A  61  GLY   62  THR   63  ILE   64  ASP   65  PHE   66  PRO   67  GLU   68  
PHE   69  LEU   70  THR 
A  71  MET   72  MET   73  ALA   74  ARG   75  LYS   76  MET   77  LYS   78  
ASP   79  THR   80  ASP 
A  81  SER   82  GLU   83  GLU   84  GLU   85  ILE   86  ARG   87  GLU   88  
ALA   89  PHE   90  ARG 
A  91  VAL   92  PHE   93  ASP   94  LYS   95  ASP   96  GLY   97  ASN   98  
GLY   99  TYR  100  ILE 
A 101  SER  102  ALA  103  ALA  104  GLU  105  LEU  106  ARG  107  HIS  108  
VAL  109  MET  110  
THR 
A 111  ASN  112  LEU  113  GLY  114  GLU  115  LYS  116  LEU  117  THR  118  
ASP  119  GLU  120  
GLU 
A 121  VAL  122  ASP  123  GLU  124  MET  125  ILE  126  ARG  127  GLU  128  
ALA  129  ASP  130  
ILE 
A 131  ASP  132  GLY  133  ASP  134  GLY  135  GLN  136  VAL  137  ASN  138  
TYR  139  GLU  140  
GLU 
A 141  PHE  142  VAL  143  GLN  144  MET  145  MET  146  THR  147  ALA  148  LYS

Restraint ranges
Rst:    .   1   .   HA - 148   .   HA diff 147 ch.range 148
Rst:    . 181   .   MN - 184   .   MN diff   3 ch.range 152
Rst:    . 201   .   HA - 223   .   HA diff  22 ch.range 175
Rst:    . 225   .   HA - 226   .   HA diff   1 ch.range 177

Triplet matches
Restraint    SEQRS Offset
Start guessing.
DEBUG: Tue Aug 19 18:13:17 CEST 2008
DEBUG: Tue Aug 19 18:13:51 CEST 2008
DEBUG: Tue Aug 19 18:13:51 CEST 2008
DEBUG: Tue Aug 19 18:13:53 CEST 2008
  assign
DEBUG: Tue Aug 19 18:13:53 CEST 2008
DEBUG: Tue Aug 19 18:13:57 CEST 2008
  surplus
DEBUG: Tue Aug 19 18:13:58 CEST 2008
DEBUG: Tue Aug 19 18:14:06 CEST 2008
  viol
DEBUG: Tue Aug 19 18:14:06 CEST 2008
DEBUG: Tue Aug 19 18:14:09 CEST 2008
  compl
DEBUG: Tue Aug 19 18:14:09 CEST 2008
DEBUG: Tue Aug 19 18:14:21 CEST 2008
  exportsForGrid
DEBUG: Tue Aug 19 18:14:21 CEST 2008
DEBUG: Tue Aug 19 18:14:27 CEST 2008
  gridOrganize
DEBUG: Tue Aug 19 18:14:27 CEST 2008
  gridDump
DEBUG: Tue Aug 19 18:14:30 CEST 2008
DEBUG: Tue Aug 19 18:14:35 CEST 2008
  Finished 2k0e
113.178u 4.911s 2:03.46 95.6%   0+0k 1+360io 104pf+0w
j

Original issue reported on code.google.com by jurge...@gmail.com on 19 Aug 2008 at 4:34

GoogleCodeExporter commented 9 years ago

Original comment by jurge...@gmail.com on 19 Aug 2008 at 4:34

Added labels: Entry-2k0e

GoogleCodeExporter commented 9 years ago

As I've mentioned before, it's the NMR-STAR export that takes the longest (this 
is
what -noWrite turns off, but then it's not much use for you of course).

Anyway I'm pretty sure this can be sped up, but I'm first going to assign this 
task
to Chris Penkett, see if he can come up with anything.

Original comment by wfvran...@gmail.com on 20 Aug 2008 at 3:42

GoogleCodeExporter commented 9 years ago

I'm bumping the priority up because this issue is preventing us from doing all 
models and as PDB advisory board 
just passed that we shouldn't limit authors from depositing any certain number.

Also, it takes 4 days on 2 cpus to process all entries with just one model. It 
would take too long to do all models 
with the current setup. Of course we could flock to other machines but it's 
less trivial than optimizing the code 
I'm sure.

Original comment by jurge...@gmail.com on 2 Sep 2008 at 12:47

Added labels: Type-Task, Priority-Critical, Milestone-Release3.1
Removed labels: Type-Enhancement, Priority-Medium

GoogleCodeExporter commented 9 years ago

Original comment by jurge...@gmail.com on 16 Sep 2008 at 3:40

GoogleCodeExporter commented 9 years ago

Please ask Wim about the details.

Original comment by jurge...@gmail.com on 16 Sep 2008 at 3:41

GoogleCodeExporter commented 9 years ago

OK spend a morning looking at this, and identified an easy speedup, plus some 
minor
ones. For entry 1ieh it speeds things up by about 30% on my laptop.

It won't be straightforward to speed up the NMR-STAR export more at this point
(although I have some ideas), so let me know if this speedup is at least 
satisfactory.

Update info:

-RECOORD code

Just for general info then, the current linkNmrStarData.py timings for 1ieh 
(full entry):

- readNmrStar     ~30 secs
- linkResonances  ~ 4 secs
- Write CCPN proj ~ 5 secs
- Write NMR-STAR  ~64 secs

Used to be about 40 seconds slower.

Original comment by wfvran...@gmail.com on 1 Oct 2008 at 12:52

GoogleCodeExporter commented 9 years ago

How long do all models of 2k0e take?

Tim S. is here now and kindly offered to help Chris P. a bit too. Perhaps the 
two of you could try to take it a step 
further still.

Original comment by jurge...@gmail.com on 1 Oct 2008 at 1:02

GoogleCodeExporter commented 9 years ago

Chris P. is not the speedup - I did all the profiling and code changes in the 
end. If
Tim wants to spend his time looking into the NmrStarExport code, fine with me,
although I suspect he probably has more relevant things to do.

Original comment by wfvran...@gmail.com on 1 Oct 2008 at 1:24

GoogleCodeExporter commented 9 years ago

And there's a 'doing' missing there of course...

Original comment by wfvran...@gmail.com on 1 Oct 2008 at 1:25

GoogleCodeExporter commented 9 years ago

OK, let's hope this one is now finished off: 2k0e (complete entry) took 22 
minutes on
my laptop, now 5 minutes 20 seconds.

Hope that's fast enough - a 'normal' entry like 1ieh now takes 1 minute 20 
seconds to
get through linkNmrStarData.

Update info:

- RECOORD code

There are still minor speedups that could be done (maybe making it 10% or so 
faster),
but nothing else of this magnitude. What's left is mostly API.

Original comment by wfvran...@gmail.com on 1 Oct 2008 at 4:03

GoogleCodeExporter commented 9 years ago

big hug!
check it in so we can try tomorrow on tang

Original comment by jurge...@gmail.com on 1 Oct 2008 at 4:28

GoogleCodeExporter commented 9 years ago

I checked it in before sending out comment 10... should be working.

Original comment by wfvran...@gmail.com on 2 Oct 2008 at 10:12

GoogleCodeExporter commented 9 years ago

my bad
I will start a run over the weekend if it's this fast.
Thanks again Wim!

Original comment by jurge...@gmail.com on 2 Oct 2008 at 11:57

GoogleCodeExporter commented 9 years ago

Running the whole NRG setup first with 
one model: 3' 07
7500 residues (49 models): 9'30

Very nice! 

Now I'll run all models for this entry..

Original comment by jurge...@gmail.com on 3 Oct 2008 at 9:08

Changed state: Verified

GoogleCodeExporter commented 9 years ago

Wim, I see in linkNmrStarData.py revision 1.13 (head) line 124:

  # Maximum number of coordinate models to read - set to something low when testing
  numModelsToRead = 50

Are you sure you did all 160 models in 5 minutes?

Original comment by jurge...@gmail.com on 3 Oct 2008 at 10:31

Changed state: Started

GoogleCodeExporter commented 9 years ago

Ah indeed, but you must've been testing on the 50 as well before - I thought we 
were
going to limit the number of models in any case? The speedup remains - it did 
take
over 20 mins with the 50 models before.

Anyway try with 9999 for numModelsToRead, then - don't think it's a linear 
scale so
that will take a while.

Original comment by wfvran...@gmail.com on 3 Oct 2008 at 10:38

GoogleCodeExporter commented 9 years ago

Since the PDB folks are not recommended to limit the number of NMR models and 
we would like to process all 
data, we'll not limit.

Could you first try the 160 models on your laptop first yourself. No hurry, 
perhaps overnight. I'm worried it might 
still run out of memory or so...

Original comment by jurge...@gmail.com on 3 Oct 2008 at 11:11

GoogleCodeExporter commented 9 years ago

Well it depends how you look at it. The 160 models in this entry are heavily
modelled, so you do not have the normal direct correspondence between 
restraints and
coordinates, so should we include them in the first place (their stats will be 
crap)?

I'm not too convinced. Also, I'm not sure if this discussion is finished on the 
PDB side.

Original comment by wfvran...@gmail.com on 3 Oct 2008 at 11:27

GoogleCodeExporter commented 9 years ago

In any case, the current NMR-STAR input file I have only has 50 models. Do you 
have
the one with 160 models available somewhere?

Original comment by wfvran...@gmail.com on 3 Oct 2008 at 11:38

GoogleCodeExporter commented 9 years ago

It's ready on tang at:
$ccpn_tmp_dir/data/archives/bmrb/nmrRestrGrid/2k0e/joinedCoord.str 
thanks!

Original comment by jurge...@gmail.com on 3 Oct 2008 at 12:03

GoogleCodeExporter commented 9 years ago

Whole thing with 160 models takes just over 21 minutes on my laptop, no memory 
issues
(got 2Gb on this one). Sounds allright to me, seeing as there's only a couple 
of these.

Original comment by wfvran...@gmail.com on 3 Oct 2008 at 12:44

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Thanks, I'll check now..

Original comment by jurge...@gmail.com on 3 Oct 2008 at 12:49

GoogleCodeExporter commented 9 years ago

Excellent, it takes about 27 minutes on tang; very big improvement indeed.
I'll reprocess the whole set soon now.

Original comment by jurge...@gmail.com on 3 Oct 2008 at 1:31

Changed state: Verified

google-code-export / nmrrestrntsgrid

FC speed improvements #108