Closed GoogleCodeExporter closed 9 years ago
Original comment by jurge...@gmail.com
on 19 Aug 2008 at 4:34
As I've mentioned before, it's the NMR-STAR export that takes the longest (this
is
what -noWrite turns off, but then it's not much use for you of course).
Anyway I'm pretty sure this can be sped up, but I'm first going to assign this
task
to Chris Penkett, see if he can come up with anything.
Original comment by wfvran...@gmail.com
on 20 Aug 2008 at 3:42
I'm bumping the priority up because this issue is preventing us from doing all
models and as PDB advisory board
just passed that we shouldn't limit authors from depositing any certain number.
Also, it takes 4 days on 2 cpus to process all entries with just one model. It
would take too long to do all models
with the current setup. Of course we could flock to other machines but it's
less trivial than optimizing the code
I'm sure.
Original comment by jurge...@gmail.com
on 2 Sep 2008 at 12:47
Original comment by jurge...@gmail.com
on 16 Sep 2008 at 3:40
Please ask Wim about the details.
Original comment by jurge...@gmail.com
on 16 Sep 2008 at 3:41
OK spend a morning looking at this, and identified an easy speedup, plus some
minor
ones. For entry 1ieh it speeds things up by about 30% on my laptop.
It won't be straightforward to speed up the NMR-STAR export more at this point
(although I have some ideas), so let me know if this speedup is at least
satisfactory.
Update info:
-RECOORD code
Just for general info then, the current linkNmrStarData.py timings for 1ieh
(full entry):
- readNmrStar ~30 secs
- linkResonances ~ 4 secs
- Write CCPN proj ~ 5 secs
- Write NMR-STAR ~64 secs
Used to be about 40 seconds slower.
Original comment by wfvran...@gmail.com
on 1 Oct 2008 at 12:52
How long do all models of 2k0e take?
Tim S. is here now and kindly offered to help Chris P. a bit too. Perhaps the
two of you could try to take it a step
further still.
Original comment by jurge...@gmail.com
on 1 Oct 2008 at 1:02
Chris P. is not the speedup - I did all the profiling and code changes in the
end. If
Tim wants to spend his time looking into the NmrStarExport code, fine with me,
although I suspect he probably has more relevant things to do.
Original comment by wfvran...@gmail.com
on 1 Oct 2008 at 1:24
And there's a 'doing' missing there of course...
Original comment by wfvran...@gmail.com
on 1 Oct 2008 at 1:25
OK, let's hope this one is now finished off: 2k0e (complete entry) took 22
minutes on
my laptop, now 5 minutes 20 seconds.
Hope that's fast enough - a 'normal' entry like 1ieh now takes 1 minute 20
seconds to
get through linkNmrStarData.
Update info:
- RECOORD code
There are still minor speedups that could be done (maybe making it 10% or so
faster),
but nothing else of this magnitude. What's left is mostly API.
Original comment by wfvran...@gmail.com
on 1 Oct 2008 at 4:03
big hug!
check it in so we can try tomorrow on tang
Original comment by jurge...@gmail.com
on 1 Oct 2008 at 4:28
I checked it in before sending out comment 10... should be working.
Original comment by wfvran...@gmail.com
on 2 Oct 2008 at 10:12
my bad
I will start a run over the weekend if it's this fast.
Thanks again Wim!
Original comment by jurge...@gmail.com
on 2 Oct 2008 at 11:57
Running the whole NRG setup first with
one model: 3' 07
7500 residues (49 models): 9'30
Very nice!
Now I'll run all models for this entry..
Original comment by jurge...@gmail.com
on 3 Oct 2008 at 9:08
Wim, I see in linkNmrStarData.py revision 1.13 (head) line 124:
# Maximum number of coordinate models to read - set to something low when testing
numModelsToRead = 50
Are you sure you did all 160 models in 5 minutes?
Original comment by jurge...@gmail.com
on 3 Oct 2008 at 10:31
Ah indeed, but you must've been testing on the 50 as well before - I thought we
were
going to limit the number of models in any case? The speedup remains - it did
take
over 20 mins with the 50 models before.
Anyway try with 9999 for numModelsToRead, then - don't think it's a linear
scale so
that will take a while.
Original comment by wfvran...@gmail.com
on 3 Oct 2008 at 10:38
Since the PDB folks are not recommended to limit the number of NMR models and
we would like to process all
data, we'll not limit.
Could you first try the 160 models on your laptop first yourself. No hurry,
perhaps overnight. I'm worried it might
still run out of memory or so...
Original comment by jurge...@gmail.com
on 3 Oct 2008 at 11:11
Well it depends how you look at it. The 160 models in this entry are heavily
modelled, so you do not have the normal direct correspondence between
restraints and
coordinates, so should we include them in the first place (their stats will be
crap)?
I'm not too convinced. Also, I'm not sure if this discussion is finished on the
PDB side.
Original comment by wfvran...@gmail.com
on 3 Oct 2008 at 11:27
In any case, the current NMR-STAR input file I have only has 50 models. Do you
have
the one with 160 models available somewhere?
Original comment by wfvran...@gmail.com
on 3 Oct 2008 at 11:38
It's ready on tang at:
$ccpn_tmp_dir/data/archives/bmrb/nmrRestrGrid/2k0e/joinedCoord.str
thanks!
Original comment by jurge...@gmail.com
on 3 Oct 2008 at 12:03
Whole thing with 160 models takes just over 21 minutes on my laptop, no memory
issues
(got 2Gb on this one). Sounds allright to me, seeing as there's only a couple
of these.
Original comment by wfvran...@gmail.com
on 3 Oct 2008 at 12:44
[deleted comment]
Thanks, I'll check now..
Original comment by jurge...@gmail.com
on 3 Oct 2008 at 12:49
Excellent, it takes about 27 minutes on tang; very big improvement indeed.
I'll reprocess the whole set soon now.
Original comment by jurge...@gmail.com
on 3 Oct 2008 at 1:31
Original issue reported on code.google.com by
jurge...@gmail.com
on 19 Aug 2008 at 4:34