Open lukstafi opened 1 month ago
Using cu_mem_get_info gives results that are not very meaningful.
cu_mem_get_info
For example:
┌─────────────────────────────────────────────────────────────────────────────────────────┬───────────┬───────────────┬───────┬─────────┬───────────────────────────────────────────────────┐ │Benchmarks │Time in sec│Memory in bytes│Speedup│Mem gain │init time in sec, min loss, last loss │ ├─────────────────────────────────────────────────────────────────────────────────────────┼───────────┼───────────────┼───────┼─────────┼───────────────────────────────────────────────────┤ │seed 7, inline 0, parallel 1, batch 240, backend cc, val prec single, grad prec single │0.229846796│187036 │5.306 │18431.763│(0.602457722 62.728876709938049 62.728876709938049)│ │seed 7, inline 0, parallel 1, batch 240, backend cc, val prec half, grad prec half │0.681410625│93522 │1.790 │36861.950│(0.830183092 62.6259765625 62.6259765625) │ │seed 7, inline 0, parallel 1, batch 240, backend cuda, val prec single, grad prec single │0.796467672│3447403316 │1.531 │1.000 │(3.596758795 62.728905558586121 62.728905558586121)│ │seed 7, inline 0, parallel 1, batch 240, backend cuda, val prec half, grad prec half │1.219598776│2061500416 │1.000 │1.672 │(3.974976031 62.93798828125 62.93798828125) │ │seed 7, inline 3, parallel 1, batch 240, backend cc, val prec single, grad prec single │0.251448531│187036 │4.850 │18431.763│(0.511715823 62.7288755774498 62.7288755774498) │ │seed 7, inline 3, parallel 1, batch 240, backend cc, val prec half, grad prec half │0.63360842 │93522 │1.925 │36861.950│(0.585796587 62.30078125 62.30078125) │ │seed 7, inline 3, parallel 1, batch 240, backend cuda, val prec single, grad prec single │0.657724256│2210398208 │1.854 │1.560 │(0.996566334 62.728905558586121 62.728905558586121)│ │seed 7, inline 3, parallel 1, batch 240, backend cuda, val prec half, grad prec half │0.779391164│1088421888 │1.565 │3.167 │(1.305761225 62.2236328125 62.2236328125) │ │seed 7, inline 0, parallel 3, batch 240, backend cc, val prec single, grad prec single │0.245330525│571884 │4.971 │6028.151 │(0.808980378 62.153002977371216 62.153002977371216)│ │seed 7, inline 0, parallel 3, batch 240, backend cc, val prec half, grad prec half │0.459211186│285954 │2.656 │12055.797│(1.063122458 62.41552734375 62.41552734375) │ │seed 7, inline 0, parallel 3, batch 240, backend cuda, val prec single, grad prec single │0.524303261│1352663040 │2.326 │2.549 │(3.233237763 63.376171588897705 63.376171588897705)│ │seed 7, inline 0, parallel 3, batch 240, backend cuda, val prec half, grad prec half │0.750559389│612368384 │1.625 │5.630 │(5.178235428 62.83740234375 62.83740234375) │ │seed 7, inline 3, parallel 3, batch 240, backend cc, val prec single, grad prec single │0.246047198│571884 │4.957 │6028.151 │(0.72678405 62.152995347976685 62.152995347976685) │ │seed 7, inline 3, parallel 3, batch 240, backend cc, val prec half, grad prec half │0.446806293│285954 │2.730 │12055.797│(0.838345553 62.47265625 62.47265625) │ │seed 7, inline 3, parallel 3, batch 240, backend cuda, val prec single, grad prec single │0.558565954│715128832 │2.183 │4.821 │(1.419007865 63.376166462898254 63.376166462898254)│ │seed 7, inline 3, parallel 3, batch 240, backend cuda, val prec half, grad prec half │0.662616926│341835776 │1.841 │10.085 │(2.182560358 62.17529296875 62.17529296875) │ │seed 7, inline 0, parallel 6, batch 240, backend cc, val prec single, grad prec single │0.324366117│1176096 │3.760 │2931.226 │(1.099047585 61.730027139186859 61.730027139186859)│ │seed 7, inline 0, parallel 6, batch 240, backend cc, val prec half, grad prec half │0.537282895│588072 │2.270 │5862.213 │(1.315531069 62.76953125 62.76953125) │ │seed 7, inline 0, parallel 6, batch 240, backend cuda, val prec single, grad prec single │0.557164894│580911104 │2.189 │5.934 │(2.652769076 63.376184284687042 63.376184284687042)│ │seed 7, inline 0, parallel 6, batch 240, backend cuda, val prec half, grad prec half │0.659206927│297795584 │1.850 │11.576 │(4.897720286 62.7421875 62.7421875) │ │seed 7, inline 3, parallel 6, batch 240, backend cc, val prec single, grad prec single │0.327492657│1176096 │3.724 │2931.226 │(0.945304816 61.718904912471771 61.718904912471771)│ │seed 7, inline 3, parallel 6, batch 240, backend cc, val prec half, grad prec half │0.496853717│588072 │2.455 │5862.213 │(1.055382175 60.982421875 60.982421875) │ │seed 7, inline 3, parallel 6, batch 240, backend cuda, val prec single, grad prec single │0.484854294│337641472 │2.515 │10.210 │(1.661079693 63.376177906990051 63.376177906990051)│ │seed 7, inline 3, parallel 6, batch 240, backend cuda, val prec half, grad prec half │0.637598667│153092096 │1.913 │22.518 │(2.544604816 62.099609375 62.099609375) │ │seed 7, inline 0, parallel 12, batch 240, backend cc, val prec single, grad prec single │0.374894618│2481504 │3.253 │1389.239 │(1.55095354 61.862113118171692 61.862113118171692) │ │seed 7, inline 0, parallel 12, batch 240, backend cc, val prec half, grad prec half │0.565150972│1240800 │2.158 │2778.371 │(1.795796058 62.04931640625 62.04931640625) │ │seed 7, inline 0, parallel 12, batch 240, backend cuda, val prec single, grad prec single│0.579294911│276824064 │2.105 │12.453 │(2.876144217 63.376185953617096 63.376185953617096)│ │seed 7, inline 0, parallel 12, batch 240, backend cuda, val prec half, grad prec half │0.697255179│153092096 │1.749 │22.518 │(4.924106615 62.80078125 62.80078125) │ │seed 7, inline 3, parallel 12, batch 240, backend cc, val prec single, grad prec single │0.363463621│2481504 │3.355 │1389.239 │(1.313944785 61.862080454826355 61.862080454826355)│ │seed 7, inline 3, parallel 12, batch 240, backend cc, val prec half, grad prec half │0.562140134│1240800 │2.170 │2778.371 │(1.499167458 61.90234375 61.90234375) │ │seed 7, inline 3, parallel 12, batch 240, backend cuda, val prec single, grad prec single│0.596052431│180355072 │2.046 │19.115 │(2.841286029 63.376178562641144 63.376178562641144)│ │seed 7, inline 3, parallel 12, batch 240, backend cuda, val prec half, grad prec half │0.663990027│67108864 │1.837 │51.370 │(4.696311311 61.94580078125 61.94580078125) │ │seed 7, inline 0, parallel 16, batch 240, backend cc, val prec single, grad prec single │0.474191279│3423616 │2.572 │1006.948 │(1.769277872 61.757832944393158 61.757832944393158)│ │seed 7, inline 0, parallel 16, batch 240, backend cc, val prec half, grad prec half │0.577466576│1711872 │2.112 │2013.821 │(2.305998903 61.90576171875 61.90576171875) │ │seed 7, inline 0, parallel 16, batch 240, backend cuda, val prec single, grad prec single│0.620073868│186646528 │1.967 │18.470 │(2.423241004 63.376178324222565 63.376178324222565)│ │seed 7, inline 0, parallel 16, batch 240, backend cuda, val prec half, grad prec half │0.764845615│88080384 │1.595 │39.139 │(4.584916593 62.71826171875 62.71826171875) │ │seed 7, inline 3, parallel 16, batch 240, backend cc, val prec single, grad prec single │0.412470958│3423616 │2.957 │1006.948 │(1.665891434 61.757833182811737 61.757833182811737)│ │seed 7, inline 3, parallel 16, batch 240, backend cc, val prec half, grad prec half │0.59185156 │1711872 │2.061 │2013.821 │(1.89430643 61.8232421875 61.8232421875) │ │seed 7, inline 3, parallel 16, batch 240, backend cuda, val prec single, grad prec single│0.617704358│109051904 │1.974 │31.613 │(2.188090388 63.376178324222565 63.376178324222565)│ │seed 7, inline 3, parallel 16, batch 240, backend cuda, val prec half, grad prec half │0.741150168│41943040 │1.646 │82.193 │(3.408914471 61.9169921875 61.9169921875) │ └─────────────────────────────────────────────────────────────────────────────────────────┴───────────┴───────────────┴───────┴─────────┴────────────────────────
Using
cu_mem_get_info
gives results that are not very meaningful.For example: