PCMDI / amipbcs

Code to generate boundary condition data for the AMIP experiment suite
2 stars 0 forks source link

Deal with VCS check plot memory and time issues #10

Closed durack1 closed 7 years ago

durack1 commented 7 years ago

The issues described in UV-CDAT/uvcdat#1424 will be investigated within tweaks generated to solve the reported problems following VCS changes documented in UV-CDAT/vcs#236

doutriaux1 commented 7 years ago

@durack1 @lee1043 please post your codes here. I had @lee1043 leak nothing before.

durack1 commented 7 years ago

@doutriaux1, it's probably better to keep the issue (posted code, discussion) back in the vcs repo, no? If so which issue #?

doutriaux1 commented 7 years ago

Let's start here because I suspect it's user error at this point.

durack1 commented 7 years ago

@doutriaux1 the two variants currently running (as it takes soooooooo long to plot these things, they're still running 12hrs later..) are: PCMDI/amipbcs/make_newVsOldDiffs.py and PCMDI/amipbcs/make_newVsOldDiffs-delFudge.py

Both reference data on ocean

doutriaux1 commented 7 years ago

@durack1 you had a template object created that you were not clearing, once this is fixed it seems stable. Will test again shortly when I get onsite because remotely it is painful. Might try wirth a mesa version as well just to see.

(nightly) doutriaux1@oceanonly:[Paul]:[1830]> python test_leak.py 
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /usr/local/anaconda2/envs/nightly
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 81.494 secs; Max mem: 1.497 GB PyObj#: 0113248;
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
00002 processing: 1870-02 sic     Time: 93.325 secs; Max mem: 1.911 GB PyObj#: 0113249;
00003 processing: 1870-03 sic     Time: 78.404 secs; Max mem: 1.911 GB PyObj#: 0113005;
00004 processing: 1870-04 sic     Time: 78.045 secs; Max mem: 1.911 GB PyObj#: 0113006;
00005 processing: 1870-05 sic     Time: 76.214 secs; Max mem: 1.911 GB PyObj#: 0113007;
doutriaux1 commented 7 years ago

@durack1 ok switching to mesalib I get ridiculous speed gain... I do see the memory increasing, I will investigate further

(nightly) doutriaux1@oceanonly:[Paul]:[1831]> python test_leak.py 
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /usr/local/anaconda2/envs/nightly
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 05.058 secs; Max mem: 1.535 GB PyObj#: 0113186;
00002 processing: 1870-02 sic     Time: 05.195 secs; Max mem: 2.023 GB PyObj#: 0113202;
00003 processing: 1870-03 sic     Time: 04.915 secs; Max mem: 2.026 GB PyObj#: 0113112;
00004 processing: 1870-04 sic     Time: 04.977 secs; Max mem: 2.073 GB PyObj#: 0113133;
00005 processing: 1870-05 sic     Time: 05.094 secs; Max mem: 2.122 GB PyObj#: 0113114;
00006 processing: 1870-06 sic     Time: 05.198 secs; Max mem: 2.174 GB PyObj#: 0113255;
00007 processing: 1870-07 sic     Time: 04.765 secs; Max mem: 2.222 GB PyObj#: 0113196;
00008 processing: 1870-08 sic     Time: 04.840 secs; Max mem: 2.273 GB PyObj#: 0113117;
00009 processing: 1870-09 sic     Time: 04.438 secs; Max mem: 2.318 GB PyObj#: 0113198;
durack1 commented 7 years ago

@doutriaux1 what's the object name, and the command to delete it? I'll add it in and rerun my script/log..

durack1 commented 7 years ago

@doutriaux1 I think that a trimmed down version of this script, along with a single timeslice of input data should be added to the test suite.. It seems it's a very sensitive example of timings and memory usage when turning mesalib on and off.. And any further tweaks to EzTemplate

doutriaux1 commented 7 years ago

@durack1 yes once we figured out where the 50Mb per plot come from 😉 @danlipsa I suspect something in isofills?

durack1 commented 7 years ago

@doutriaux1 would you like to revisit the "user error" https://github.com/PCMDI/amipbcs/issues/10#issuecomment-324334436 above?

doutriaux1 commented 7 years ago

@durack1 @danlipsa, yes it's something in the plotting of isofill/boxfill I will track it today or tomorrow.

durack1 commented 7 years ago

@doutriaux1 time to move discussions back into the vcs repo? I think it'll be cleaner to generate a dedicated issue and move comments there

doutriaux1 commented 7 years ago

not fully revisiting the user erro, there was some left over templates which were creating additional object which you complained about it. So that was definitely a user error. There is only one textorientatation created per plot now. So that user-error part is nor solved. Need to figure out why contours create such a mem leak now though.

durack1 commented 7 years ago

What was the object that required deletion? You didn't answer my https://github.com/PCMDI/amipbcs/issues/10#issuecomment-324389283 above

durack1 commented 7 years ago

@doutriaux1 see https://github.com/UV-CDAT/uvcdat/issues/1424#issuecomment-221677132 it may in fact have already been identified by @chaosphere2112

The tool described in https://github.com/UV-CDAT/uvcdat/issues/1424#issuecomment-224293967 may also be really useful in determining the cause and finally sorting out this issue once and for all

doutriaux1 commented 7 years ago

@durack1 rightfully pointe that using X we have no more leak!

(nightlyX) doutriaux1@oceanonly:[Paul]:[1896]> python test_leak.py 
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /usr/local/anaconda2/envs/nightlyX
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 02.280 secs; Max mem: 1.543 GB PyObj#: 0113551;
00002 processing: 1870-02 sic     Time: 02.083 secs; Max mem: 2.008 GB PyObj#: 0113254;
00003 processing: 1870-03 sic     Time: 01.329 secs; Max mem: 2.038 GB PyObj#: 0113366;
00004 processing: 1870-04 sic     Time: 01.057 secs; Max mem: 2.038 GB PyObj#: 0113437;
00005 processing: 1870-05 sic     Time: 01.046 secs; Max mem: 2.038 GB PyObj#: 0113327;
00006 processing: 1870-06 sic     Time: 01.013 secs; Max mem: 2.038 GB PyObj#: 0113328;
00007 processing: 1870-07 sic     Time: 00.985 secs; Max mem: 2.038 GB PyObj#: 0113329;
00008 processing: 1870-08 sic     Time: 01.034 secs; Max mem: 2.038 GB PyObj#: 0113330;
00009 processing: 1870-09 sic     Time: 00.990 secs; Max mem: 2.038 GB PyObj#: 0113269;
00010 processing: 1870-10 sic     Time: 00.975 secs; Max mem: 2.038 GB PyObj#: 0113270;
00011 processing: 1870-11 sic     Time: 01.035 secs; Max mem: 2.038 GB PyObj#: 0113271;
00012 processing: 1870-12 sic     Time: 01.260 secs; Max mem: 2.038 GB PyObj#: 0113272;
00013 processing: 1871-01 sic     Time: 01.047 secs; Max mem: 2.038 GB PyObj#: 0113221;
00014 processing: 1871-02 sic     Time: 01.051 secs; Max mem: 2.038 GB PyObj#: 0113111;
00015 processing: 1871-03 sic     Time: 01.010 secs; Max mem: 2.038 GB PyObj#: 0113112;
00016 processing: 1871-04 sic     Time: 00.969 secs; Max mem: 2.038 GB PyObj#: 0113113;
00017 processing: 1871-05 sic     Time: 00.976 secs; Max mem: 2.038 GB PyObj#: 0113225;
00018 processing: 1871-06 sic     Time: 01.044 secs; Max mem: 2.038 GB PyObj#: 0113115;
00019 processing: 1871-07 sic     Time: 01.200 secs; Max mem: 2.038 GB PyObj#: 0113116;
00020 processing: 1871-08 sic     Time: 01.053 secs; Max mem: 2.038 GB PyObj#: 0113117;
00021 processing: 1871-09 sic     Time: 01.556 secs; Max mem: 2.038 GB PyObj#: 0113118;
00022 processing: 1871-10 sic     Time: 00.971 secs; Max mem: 2.038 GB PyObj#: 0113119;

But is I switched to the mesa-based one I get:

(nightly) doutriaux1@oceanonly:[Paul]:[1897]> python test_leak.py 
UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /usr/local/anaconda2/envs/nightly
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 01.813 secs; Max mem: 1.494 GB PyObj#: 0113365;
00002 processing: 1870-02 sic     Time: 01.426 secs; Max mem: 1.961 GB PyObj#: 0113336;
00003 processing: 1870-03 sic     Time: 01.177 secs; Max mem: 1.961 GB PyObj#: 0113407;
00004 processing: 1870-04 sic     Time: 01.169 secs; Max mem: 1.974 GB PyObj#: 0113297;
00005 processing: 1870-05 sic     Time: 01.180 secs; Max mem: 1.987 GB PyObj#: 0113298;
00006 processing: 1870-06 sic     Time: 01.175 secs; Max mem: 2.001 GB PyObj#: 0113299;
00007 processing: 1870-07 sic     Time: 01.218 secs; Max mem: 2.014 GB PyObj#: 0113300;
00008 processing: 1870-08 sic     Time: 01.123 secs; Max mem: 2.027 GB PyObj#: 0113301;
00009 processing: 1870-09 sic     Time: 01.394 secs; Max mem: 2.040 GB PyObj#: 0113241;
00010 processing: 1870-10 sic     Time: 01.109 secs; Max mem: 2.053 GB PyObj#: 0113242;
00011 processing: 1870-11 sic     Time: 01.127 secs; Max mem: 2.066 GB PyObj#: 0113243;
00012 processing: 1870-12 sic     Time: 01.093 secs; Max mem: 2.079 GB PyObj#: 0113244;

It is EXACTLY the same python underneath. @danlipsa I think it's either a mesalib or a VTK-built-against-mesalib issue. Your thoughts?

durack1 commented 7 years ago

@doutriaux1 @danlipsa and to fill in the holes, the reason why I can't use X11 is because with the VNC session (and bg=1) I get all black/blank pngs..

durack1 commented 7 years ago

@doutriaux1 it's also interesting to note that in your example above it's not just a single object all the time, so the transition:

00001 processing: 1870-01 sic     Time: 02.280 secs; Max mem: 1.543 GB PyObj#: 0113551;

00002 processing: 1870-02 sic     Time: 02.083 secs; Max mem: 2.008 GB PyObj#: 0113254;

00003 processing: 1870-03 sic     Time: 01.329 secs; Max mem: 2.038 GB PyObj#: 0113366;

00004 processing: 1870-04 sic     Time: 01.057 secs; Max mem: 2.038 GB PyObj#: 0113437;

00005 processing: 1870-05 sic     Time: 01.046 secs; Max mem: 2.038 GB PyObj#: 0113327;
00006 processing: 1870-06 sic     Time: 01.013 secs; Max mem: 2.038 GB PyObj#: 0113328;
00007 processing: 1870-07 sic     Time: 00.985 secs; Max mem: 2.038 GB PyObj#: 0113329;
00008 processing: 1870-08 sic     Time: 01.034 secs; Max mem: 2.038 GB PyObj#: 0113330;

00009 processing: 1870-09 sic     Time: 00.990 secs; Max mem: 2.038 GB PyObj#: 0113269;
00010 processing: 1870-10 sic     Time: 00.975 secs; Max mem: 2.038 GB PyObj#: 0113270;
00011 processing: 1870-11 sic     Time: 01.035 secs; Max mem: 2.038 GB PyObj#: 0113271;
00012 processing: 1870-12 sic     Time: 01.260 secs; Max mem: 2.038 GB PyObj#: 0113272;

00013 processing: 1871-01 sic     Time: 01.047 secs; Max mem: 2.038 GB PyObj#: 0113221;

00014 processing: 1871-02 sic     Time: 01.051 secs; Max mem: 2.038 GB PyObj#: 0113111;
00015 processing: 1871-03 sic     Time: 01.010 secs; Max mem: 2.038 GB PyObj#: 0113112;
00016 processing: 1871-04 sic     Time: 00.969 secs; Max mem: 2.038 GB PyObj#: 0113113;

00017 processing: 1871-05 sic     Time: 00.976 secs; Max mem: 2.038 GB PyObj#: 0113225;

00018 processing: 1871-06 sic     Time: 01.044 secs; Max mem: 2.038 GB PyObj#: 0113115;
00019 processing: 1871-07 sic     Time: 01.200 secs; Max mem: 2.038 GB PyObj#: 0113116;
00020 processing: 1871-08 sic     Time: 01.053 secs; Max mem: 2.038 GB PyObj#: 0113117;
00021 processing: 1871-09 sic     Time: 01.556 secs; Max mem: 2.038 GB PyObj#: 0113118;
00022 processing: 1871-10 sic     Time: 00.971 secs; Max mem: 2.038 GB PyObj#: 0113119;

Some of these have a single additional python object created (the number behind the PyObj#: entry), whereas other times the numbers are much larger than 1

doutriaux1 commented 7 years ago

@durack1 some also have a great number of py obj going down. Probably gc related.

To answer your question you had forgotten to remive the tmpl created in initVCS tmpl = vcs.createtemplate()

doutriaux1 commented 7 years ago

@durack1 when plotting s1s instead of s1 and using s1s for the diff instead of s1, which is what you actually meant we get a smaller memory print and the bug takes longer to appear:

UV-CDAT version:      2.10-19-gba336f2
UV-CDAT prefix:       /usr/local/anaconda2/envs/nightly
delFudge:             False
Background graphics:  True
donotstoredisplay:    True
00001 processing: 1870-01 sic     Time: 02.521 secs; Max mem: 1.039 GB PyObj#: 0113442;
00002 processing: 1870-02 sic     Time: 02.421 secs; Max mem: 1.039 GB PyObj#: 0113443;
00003 processing: 1870-03 sic     Time: 02.236 secs; Max mem: 1.039 GB PyObj#: 0113444;
00004 processing: 1870-04 sic     Time: 02.225 secs; Max mem: 1.039 GB PyObj#: 0113445;
00005 processing: 1870-05 sic     Time: 02.227 secs; Max mem: 1.039 GB PyObj#: 0113425;
00006 processing: 1870-06 sic     Time: 02.474 secs; Max mem: 1.039 GB PyObj#: 0113391;
00007 processing: 1870-07 sic     Time: 02.645 secs; Max mem: 1.039 GB PyObj#: 0113392;
00008 processing: 1870-08 sic     Time: 02.649 secs; Max mem: 1.039 GB PyObj#: 0113393;
00009 processing: 1870-09 sic     Time: 02.634 secs; Max mem: 1.053 GB PyObj#: 0113394;
00010 processing: 1870-10 sic     Time: 02.603 secs; Max mem: 1.097 GB PyObj#: 0113395;
00011 processing: 1870-11 sic     Time: 02.626 secs; Max mem: 1.147 GB PyObj#: 0113396;
00012 processing: 1870-12 sic     Time: 02.627 secs; Max mem: 1.198 GB PyObj#: 0113397;
00013 processing: 1871-01 sic     Time: 02.645 secs; Max mem: 1.249 GB PyObj#: 0113233;
00014 processing: 1871-02 sic     Time: 02.655 secs; Max mem: 1.291 GB PyObj#: 0113234;
00015 processing: 1871-03 sic     Time: 02.662 secs; Max mem: 1.343 GB PyObj#: 0113235;
durack1 commented 7 years ago

@doutriaux1 thanks.. This script is just a dummy/demo for me at the moment, so I don't really care about the png output.. Just was hoping to highlight the timing and memory issues in a repeatable way so that you and @danlipsa could dig in and figure out the persistent solution

I think your comments on the gc are right, as you can see every 5 iterations or so the total object count drops 20 (4->5) or more, 164 (12->13) above

durack1 commented 7 years ago

@doutriaux1 the huge virtual memory footprint is also something that's concerning..

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 66302 durack1   20   0 12.3g 1.4g  72m R 99.9  0.6   1415:55 python make_newVsOldDiffs.py

12.3Gb when the script is generating a bunch of ~100Kb pngs is kinda concerning..

doutriaux1 commented 7 years ago

@durack1 I will send you the memory leak fixed script tomorrow. But yes with mesalib you get the memory leak nothing I can do about it right now. I will stop by tomorrow and try to get your png to be non black.