HISKP-LQCD / hadron

R package implementing analysis tools for lattice QCD
15 stars 13 forks source link

Add predict.bootstrapfit function #297

Closed martin-ueding closed 4 years ago

martin-ueding commented 4 years ago

Closes #282.

I have refactored the prediction from a bootstrap fit into a function, which is a generic to predict (as in predict.lm). This is then used in the two plotting functions that we have.

On the way I have identified a few new edge cases in the residual plot and also fixed that.

urbach commented 4 years ago

some lamenting by check

❯ checking S3 generic/method consistency ... WARNING
  predict:
    function(object, ...)
  predict.bootstrapfit:
    function(object, x, error)

  See section ‘Generic functions and methods’ in the ‘Writing R
  Extensions’ manual.

❯ checking Rd cross-references ... WARNING
  Missing link or links in documentation object 'plotwitherror.Rd':
    ‘[base]{plot}’

  See section 'Cross-references' in the 'Writing R Extensions' manual.

❯ checking R code for possible problems ... NOTE
  plot.bootstrapfit: no visible global function definition for ‘predict’
  residual_plot.bootstrapfit: no visible global function definition for
    ‘predict’
  Undefined global functions or variables:
    predict
  Consider adding
    importFrom("stats", "predict")
  to your NAMESPACE file.
urbach commented 4 years ago

btw: why isn't that failing in the travis check?

urbach commented 4 years ago

otherwise, this PR works for me.

urbach commented 4 years ago

Best would be to be able to choose the range, without touching the actual fit range.

And, since we are on it: it would be great to be able to choose to have data points in the foreground... -- CU Mobil, www.carsten-urbach.eu

martin-ueding commented 4 years ago

I moved the data to the background today because I had a slight problem with that. How can I start a plot with just the ployline and still get the limits that I would get when plotting the data? Like plot.new(xlim = range(data$x), ylim = range(data$y)); polyline(…)?

urbach commented 4 years ago

plot(NA, ylim=..., xlim=..., xlab=...,...)

Points have been in the background for longer, haven't they?

On 3 August 2020 20:12:03 CEST, Martin Ueding notifications@github.com wrote:

I moved the data to the background today because I had a slight problem with that. How can I start a plot with just the ployline and still get the limits that I would get when plotting the data? Like plot.new(xlim = range(data$x), ylim = range(data$y)); polyline(…)?>

-- > You are receiving this because you commented.> Reply to this email directly or view it on GitHub:> https://github.com/HISKP-LQCD/hadron/pull/297#issuecomment-668167169

-- CU Mobil, www.carsten-urbach.eu

martin-ueding commented 4 years ago

The points indeed have been in the background for longer, I believe. At least the ones which were not used in the fit. Either way, now it has the ordering that we want, with the ribbon in the background for the ratio plot.

martin-ueding commented 4 years ago

For both plot.bootstrapfit and ratio_plot.bootstrapfit one can supply xlim and ylim, they are passed down.

martin-ueding commented 4 years ago

Regarding the order of points for the plot.bootstrapfit there has been a discussion with Johann. When there are very few points (chiral extrapolation) it makes sense to have the points in the foreground. Johann has correlators with lots of time slices and there it makes more sense to have the model in the front, as otherwise a small ribbon would drown behind the data points. This is something that one could make an option.

urbach commented 4 years ago

The points indeed have been in the background for longer, I believe. At least the ones which were not used in the fit. Either way, now it has the ordering that we want, with the ribbon in the background for the ratio plot.

@j-ostmeyer had a strong point for having points in the background, didn't he?

martin-ueding commented 4 years ago

Now you can have both, the current behavior is the default.

Bildschirmfoto_20200804-16:18:53-a7c-Auswahl

Bildschirmfoto_20200804-16:19:01-6da-Auswahl

j-ostmeyer commented 4 years ago

@j-ostmeyer had a strong point for having points in the background, Yes, if you have very many data points, you don't see the error band below. And if I remember correctly, at least the first implementation always produced two figures, only data and data+fit, instead of only one.

Of course, I'm happy with having both possibilities.

urbach commented 4 years ago

so, travis fails because of pdfcrop?

martin-ueding commented 4 years ago

There also is a secondary problem in there that I will have to fix:

formal argument "ylim" matched by multiple actual arguments

I know where it comes from, I just need to work around it.

martin-ueding commented 4 years ago

Now it is this:

Ratio of converged fits on samples: 99 / 99 = 1 

Table of nls.lm info values (1, 2, 3 are convergence):

  value frequency

1     1        99

> plot(fitres, log="y")

Error in plot.window(...) : Logarithmic axis must have positive limits

Calls: plot ... <Anonymous> -> plot.default -> localWindow -> plot.window
martin-ueding commented 4 years ago

Negative values are usually dropped in logarithmic plots. I explicitly need to call plot to initialize the plot before being able to call polygon. The first plot command needs to set the ylim already. At the moment I just use range to find out where the data lies. Now I would need to add another rule to determine whether a logarithmic plot has been requested. I can surely do that, but it feels like I'm re-writing code that is already in plot.

Does anyone have a better idea?

urbach commented 4 years ago

Negative values are usually dropped in logarithmic plots. I explicitly need to call plot to initialize the plot before being able to call polygon. The first plot command needs to set the ylim already. At the moment I just use range to find out where the data lies. Now I would need to add another rule to determine whether a logarithmic plot has been requested. I can surely do that, but it feels like I'm re-writing code that is already in plot.

Does anyone have a better idea?

well, we had the same issues for plotwitherror... I didn't have and still don't have a better idea...

kostrzewa commented 4 years ago

Does anyone have a better idea?

See compute_plotlims in plotutils.R, this should do it.

j-ostmeyer commented 4 years ago

There should be a sensible way to simply use plotwitherrors instead of rewriting the complete plotting routine. What was the problem again with calling plotwitherrors, then drawing the errorband, and last (if points have to be on top) calling plotwitherrors again with rep=TRUE? Performance should not be crucial here. I remember that Carsten used something like this as a very first version and it always gave two plots instead of one, but this should be possible to fix.

martin-ueding commented 4 years ago

Ah, that is indeed a solution! With rep = TRUE that should be fine. I'll try it out!

martin-ueding commented 4 years ago

That indeed seems to do the trick!

martin-ueding commented 4 years ago

This would be ready for review, then.

j-ostmeyer commented 4 years ago

Looks good to me. I suppose you also checked that travis not only runs though, but the plots really look the same as in the original version (modulo data on top)? Any other checks travis can't do automatically?

martin-ueding commented 4 years ago

I looked at the plots in RStudio, they look as one would expect. It can be possible that there is a combination of specifying xlim, ylim and log and the mask of the data in a way that it crashes.

j-ostmeyer commented 4 years ago

Well, if someone wants his log-plot with negative limits, then it's his own fault...