Summaries to include in report.md generators

maxentile commented 4 years ago

The report generators in supervised_train.py and supervised_param_train.py are great! They make it much easier to browse results of the numerical experiments @yuanqing-wang has been doing.

A wishlist for things that would be good to include in the future iterations of the report generator:

[ ] A few other quick summaries that may be useful to add are variance(target), stddev(target), and mean_absolute_error. For example, to compare with MoleculeNet benchmark results on QM9 energy regression task, it would be useful to have MAE. To put the RMSE in context, it would be good to know what is the standard deviation of the target value.
[ ] In model summary section, we have a lot of important detail about layer sizes, etc. Could we also add description of how node, edge, ... etc. features are initialized? (Currently says only the input dimension.) Here would also be good to describe the loss function in more detail. The description mentions that loss_fn=mse_loss, but @yuanqing-wang mentions by Slack that this loss is measured on a normalized regression target.
[ ] For R^2, could you include the definition used, perhaps in a footnote? The reported values are often negative, and I think it is using the definition 1 - (residual sum of squares) / (total sum of squares), as in sklearn.metric.r2_score, but a reader might reasonably expect one of the other definitions that leads to a non-negative value.
[ ] For R^2, often the value reported is rounded to 1.00. We might need to use more digits of precision here.
[ ] Another plot that may be informative to look at is a scatter plot of predictions and targets. (So we can see what is the variance of the target quantity, if there are just a few outliers that are dominating the RMSE summary, etc.).
[ ] The plots should have axes labeled. In some cases the x-axis is number of optimizer steps, and in some cases the number of epochs. In some cases I think the y-axis is in units of kcal/mol, and in some cases it is measuring error on regression target normalized to have mean 0 and variance 1.
[ ] In some reports, the final iterate is much worse than the best iterate. For example, in this report, an RMSE of ~5-10 (kcal/mol?) and R^2 of ~1 are attained after 60 epochs, but then the optimizer decided to go way uphill and never come back, and the report includes a table that says the model obtained an RMSE of 150 (kcal/mol?) and R^2 of 0.25. Since we're using an optimizer that doesn't always step in descent directions, could we also add to the summary a description of the best iterate encountered, in addition to the currently summarized last iterate?

maxentile commented 4 years ago

[ ] RMSE is rounded to 2 digits of precision -- we might want to use more digits of precision here, or report log error.
[ ] atom_k and atom_eq should be renamed to epsilon and sigma or similar.

maxentile commented 4 years ago

[ ] In model summary section, could we also include the total number of parameters? Often interested in this

yuanqing-wang commented 4 years ago

atom_k and atom_eq should be renamed to epsilon and sigma or similar.

this was just my shorthand to enable us to loop through terms like:

for term in ['atom', 'angle', 'bond']:
    for param in ['eq', 'k']:

maxentile commented 4 years ago

Yeah, noticed that in a few places, would need to refactor slightly so we have something like parameter_names['atom'] = ['sigma', 'epsilon', 'charge'], parameter_names['torsion'] = ['periodicities', 'force_constants', 'phase_offsets'], ..., or similar, and then say

for term in ['atom', 'angle', 'bond', 'torsion']:
   for param in parameter_names[term]:

yuanqing-wang commented 4 years ago

let's port some of the report-generating schemes that I implemented here

https://github.com/choderalab/pinot/tree/master/pinot/app

maxentile commented 4 years ago

Nice! Looks like something in this direction may be an improvement: would separate the computation of summary statistics from the generation of formatted reports, which are currently intertwined.

A couple minor comments:

Currently the report-generator functions in pinot appear to expect nested results dictionaries of specific structure that depends on the result type, hinting that these may be better to live inside a results class (results.save_html(), multiple_results_object.save_html(), multiple_results_object.save_html(grid=True), ..., rather than html(results_dict), html_multiple_train_and_test(results), html_multiple_train_and_test_2d_grid(results) ...)
In any case, I would consider using a template generator (https://wiki.python.org/moin/Templating), which may make these easier to modify

choderalab / espaloma

Summaries to include in report.md generators #3