add summary table - Githubissues

whoburg commented 8 years ago

@pgkirsch, apologies for the sudden change to printing behavior.

As we've been scaling to larger models, it turns out the table printout gets unwieldy (as you know from the aircraft model) -- e.g. the printout sometimes overwhelms the terminal buffer, such that you can't scroll up.

You'll recall the early days of GPkit, when table printing was not automatic (see http://gpkit.readthedocs.io/en/v0.1.0/examples.html#a-trivial-gp for example).

We're constantly trying to balance not making user-facing changes on one hand against what makes sense long term on the other. I think it's clear that printing solution tables doesn't scale to large models.

Here are a couple questions in the spirit of having the most user-friendly design long-term:

Should we have a SolutionArray.summary() method that prints key sensitivities and top-level model variables, along the lines of panda's DataFrame.describe()?

Should small models print their table or summary automatically, or is it better to be consistent across all models?

Should we allow user-specific printing options in gpkit/env/settings?

mayork commented 8 years ago

[x] I really like the idea of the summary table

pgkirsch commented 8 years ago

@whoburg no worries, I appreciate it's a fine balance to strike. I do think that up-to-date documentation should be treated just as importantly as unit tests though, at least for official releases.

[x] For example http://gpkit.readthedocs.io/en/v0.4.1/gettingstarted.html#solving-the-model would not pass a "documentation unit test".

I agree that a summary table would be nice!

Being consistent is probably a safer bet.

I would vote against user-specific settings to keep things simple.

whoburg commented 8 years ago

In case anyone is wondering how to get the previously-printed table, you can use

print sol.table()

where sol is a gpkit.SolutionArray, i.e. the output of Model.solve() or Model.localsolve().

pgkirsch commented 7 years ago

The new title for this issue is begging for more vaguely related requests, so here it is. In addition to the option of having a summary table(/selecting which variables print in the solution table), it would also be nice if you could choose to group the solution table (e.g. by subsystem, or by scope) and to give these groups some sort of subtitles.

One quick idea would be to use another kwarg in the Variable init. That said - and this is getting very tangential - it's already pretty difficult to keep Variable declarations less than 80 characters (particularly when creating Models (capital M) and you have 8 characters of whitespace before) and multi-line variable declarations are kind of a bummer. Anyway, I'm rambling and should probably open a separate ticket for this last thing.

bqpd commented 7 years ago

@pgkirsch, I opened that separate ticket for ya :p

pgkirsch commented 7 years ago

Too kind

bqpd commented 7 years ago

proposed summary table:

Cost
----
 0.06533 [1/day] 

Prominent Free Variables
------------------------
Fuel, Cruise, Mission |                                                                                                  
           W_{finish} : 4844        [lbf]   weight at beginning of flight segment
           W_{finish} : 1.644e+04   [lbf]   weight at beginning of flight segment
          W_{fuel-fs} : 98.67       [lbf]   flight segment fuel weight           
          W_{fuel-fs} : 135.3       [lbf]   flight segment fuel weight           
            W_{start} : 4943        [lbf]   weight at beginning of flight segment
            W_{start} : 1.657e+04   [lbf]   weight at beginning of flight segment

Sensitive Constants
-------------------
Structures |                                                                                       
\rho_{cap} : 1.76     [g/cm**3]    +0.21%/1%      Density of CF cap
          ...

Insensitive Constants
---------------------
Structures |                      
   E_{cap} : 2e+07    [psi]        -1e-3%/1%      Youngs modulus of CF cap
          ...

this would print as the default string of the solutionarray, such that:

print sol
# summary table
sol
# dictionary view

"prominent free variables" would be marked in the Variable init or as an argument to .summary. If there's less than 5 free variables it shows the "free variables" table instead.

"sensitive constants" would show the top 5 above 0.1 absolute sensitivity, if any

"insensitive constants" would show the bottom 5 below 0.01 absolute sensitivity, if any

Similarly, if there's fewer than 5 constants it shows all of them, sorted by absolute sensitivity.

bqpd commented 7 years ago

Should we switch to printing model names to the right of the |?

bqpd commented 7 years ago

@whoburg @mayork @mjburton11 @pgkirsch comments on the above format?

mjburton11 commented 7 years ago

I like this a lot!

pgkirsch commented 7 years ago

What does the "+0.21%/1% " refer to?
I don't know if prominent is the right word. Maybe "selected" or "key" or something idk.
My feeling is that for sensitivities, I don't necessarily just want to know what I am most and least sensitive to, but I want to know "how sensitive am I to X?" <-- a question I get a lot. So either sensitivity should be like free variables - you select which ones manually - or it should be something you turn on/off. I often don't want to see the sensitivities, I just want to see that my model solves and what the key variable values are, and when I do want to see sensitivities I want to see all of them.
I also prefer the term fixed variables to constants, but that's just me. I think there are both constants (e.g g, rho etc.) and fixed variables (e.g. safety factor, seat width etc.), so being able to distinguish between the two in a results table might be nice.
Further distinction by subsystem could also be nice for large models. Maybe I want all my aerodynamic variables to be grouped, and all of my structural variables to be grouped. Same with sensitivities?

bqpd commented 7 years ago

sensitivities. objective increases by 0.21% for every 1% increase in this parameter
haha, yeah I'm not a huge fan. I like selected. Key seems confusing...
hmm, sure thing. selected constants. I think there should also be 3 large/small ones?
hmm, noted. we could do better with this.
great plan. that's not for the summary table, right?

pgkirsch commented 7 years ago

Ewww, I much preferred the old way. So 1.76 refers to the actual value of the constant? Seems like that table needs some headers.
You're right, key would be confusing.
Eh.. but why 3? It seems too arbitrary to me. I really feel like sensitivities should be all or none. But that's just me.
No you're right, that could be a different solution table feature altogether.

mjburton11 commented 7 years ago

I do think it is useful to see the sensitive and insensitive constants for debugging purposes. I ran into problems with the 82 code that I solved because I noticed I was very insensitive to something that I should have been sensitive to. But I also agree with @pgkirsch in that when I'm showing someone sensitivities I want to show them the ones they care about.

This may be a terrible idea but, a solution table with a debugging flag that would print out useful stuff for debugging purposes could be a solution.

whoburg commented 7 years ago

I'm skeptical of the hard cutoffs on when sensitivities are and are not interesting. I lean toward printing all or nothing, or perhaps a sorted list, or a list of the top N?

Also, I don't think a Variable kwarg is enough to specify prominent vars. What about modular models? Surely for an electric motor design, the salient variables (winding diameter, wire diameter, max torque, etc) are not ones that would be reported as salient in an aircraft design model that uses the electric motor as a subcomponent. So which electric motor vars would I give the prominent kwarg to?

My main message is that I'm worried about a bunch of specialized functions that depend on magic numbers like sensitivity cutoffs. I say keep it as simple and robust as possible. Note: pandas .describe method takes only 3 kwargs.

bqpd commented 7 years ago

proposal for two new tables (top three most and least sensitive constants), to be printed by default when table is called with no arguments

Most Sensitive Constants
------------------------
     | Aircraft/Wing
   S : 190     [ft**2]     surface area
     + 0.6831
\rho : 1       [lbf/ft**2] areal density
     + 0.4816

     | Mission/FlightSegment/AircraftP/WingAero
   e : [ 0.9       0.9       0.9       0.9      ]              Oswald efficiency
     - 0.36 (sum)

Instead of abs(sum(, can use max(abs(, if it's bigger.

Least Sensitive Constants
-------------------------
     | Mission/FlightSegment/FlightState
\rho : [ 0.74      0.74      0.74      0.74     ]  [kg/m**3]   air density
     - 1.01e-2 (sum)
 \mu : [ 1.63e-05  1.63e-05  1.63e-05  1.63e-05 ]  [N*s/m**2]  dynamic viscosity
     + 4.32e-3 (min, at index (1, 2))

Instead of min(abs(, can use sum(, if it's smaller.

Would also remove minval in the sensitivities table, so that if that's explicitly printed it shows all of them.

bqpd commented 7 years ago

Looking for ways to indicate that the rows below each variable are their sensitivities.

bqpd commented 7 years ago

@mayork, @1ozturkbe , @pgkirsch, @mjburton11, please comment on the above proposal so I don't implement it blindly :p

1ozturkbe commented 7 years ago

This is a gnarly topic... I agree with most, saying that it is generally important to have all or nothing in terms of sensitivities in the table. I also think that the least sensitive constants are 99% of the time completely useless. With respect to @mjburton11 saying that the low sensitivity constants are useful for debugging, if you were trying to debug, you would look at all of the sensitivities anyways. If it were up to me, I'd only have the 5-8 most sensitive variables, as well as the ability to see specific sensitivities that the user may have tagged.
Additionally, I like what you have done with the summing of the sensitivities of vector variables. It makes it easier to interpret.

mjburton11 commented 7 years ago

Is there a reason why we don't want the sensitivities in line with everything else.

Variable: value [units] sensitivity label

or something like that.

With larger models it becomes very difficult to parse through all the sensitivities and so I still think that printing out the low sensitivities can be useful.

I also think that the table should print out all the sensitivities. But it would be great to have a separate function that prints out the highest and/or lowest sensitivities for a sensitivity threshold that the user specifies.

mayork commented 7 years ago

printing the sensitivity in the table seems like an interesting idea. I don't like the idea of printing out only the x most sensitive or least sensitive variables, this seems like a good way to potentially miss some valuable information.

bqpd commented 7 years ago

I suggested a format for that and no-one said they liked it. Main disadvantage is it makes the whole table wider...

bqpd commented 7 years ago

@mjburton11, @mayork I encourage you to edit one of the tables above to create your desired format.

mayork commented 7 years ago

in large models when there are tons of model names tagged to variables the table is already very wide, any chance people's thoughts on this have changed?

bqpd commented 7 years ago

We've narrowed the table considerably in a recent PR (https://github.com/hoburg/gpkit/pull/965)

bqpd commented 7 years ago

@mjburton11, you actually can already print a sensitivities table with custom minimum value already! The problem is to a large extent one of discoverability...how do we suggest analysis processes to users? Examples? And what should we emphasize by default?

mjburton11 commented 7 years ago

how do you do that?

bqpd commented 7 years ago

from gpkit.solution_array import results_table
print results_table(sol["sensitivities"]["constants"], "Sensitivities", minval=1000, printunits=False)

pgkirsch commented 7 years ago

A bit late to the party, sorry. As a user, I think that I am often looking for specific sensitivities, so I still think the default should be an alphabetical-order list of fixed variables.

I don't particularly like the sample output above. It seems cluttered/unclear, and alternating one kind of data (fixed values) with another (sensitivities) seems a bit dubious in a plain text table. I really prefer separating the values from the sensitivities, but I feel like I'm the odd one out here.

Also what does the sum of sensitivities mean?

pgkirsch commented 7 years ago

I would prefer putting sensitivities on the same line as the fixed variables, rather than below. Surely that's possible in under 80 characters?

bqpd commented 7 years ago

@pgkirsch, it really depends on the description and (vector) value length, and they've been observed to be quite long. You're welcome to suggest a specific implementation using the example data!

pgkirsch commented 7 years ago

I agree that with @1ozturkbe that most of the time you don't care about insensitive fixed vars.

I also agree with @mayork that only picking the N largest sensitivities is a good way to miss useful information. I think that it isn't the absolute largest sensitivities that are always the most interesting, but the sensitivities that are larger than you expect.

bqpd commented 7 years ago

re: expectations, good point!

whoburg commented 7 years ago

The summary table would sure be useful for a talk I'm giving Tuesday morning! What I need is a large model (JHO or solar preferred, since I'm talking about those a lot) that I can copy/paste into an ipython notebook and demonstrate cool things on (e.g. printing results, showing top sensitivities, getting latex)

lochieferrier commented 7 years ago

I have a way of displaying GP results in OpenVSP that's working great now as well, so if you have OpenVSP files and GPs that you'd like to bring together, I can do that for you.

On Sat, 17 Dec 2016 at 19:44 Warren Hoburg notifications@github.com wrote:

The summary table would sure be useful for a talk I'm giving Tuesday morning! What I need is a large model (JHO or solar preferred, since I'm talking about those a lot) that I can copy/paste into an ipython notebook and demonstrate cool things on (e.g. printing results, showing top sensitivities, getting latex)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hoburg/gpkit/issues/805#issuecomment-267796308, or mute the thread https://github.com/notifications/unsubscribe-auth/AIp8XsLsj3VoOmF-QWycjq2Osyq1ksPIks5rJIH9gaJpZM4JeW7N .

bqpd commented 7 years ago

@pgkirsch, I think I'm going to implement the summary table pretty much as above and make it the default string representation of a SolutionArray (so that it's what shows up when you do print sol, but print sol.table() will be unchanged for now).

The idea of finding answers that don't match expectations is interesting though. Do you think it'd be feasible to tag variables with "expected" sensitivities?

pgkirsch commented 7 years ago

Sounds good.

And no, I think that's asking too much from a user and doesn't add much value, in my opinion. I would hate to have to change said tag every time I make a significant change to a model.

convexengineering / gpkit

add summary table #805