biometrician / abe

An R package for Augmented Backward Elimination
GNU General Public License v3.0
3 stars 0 forks source link

#39: running time of `plot.abe()` or `print.abe()` #39

Closed biometrician closed 1 year ago

biometrician commented 1 year ago

The size of abe.resampling_objects is much more reasonable with the new save.out = "minimal". This really helps a lot. However, we have still the problem, that calls to plot.abe() or print.abe() may take a considerable time for some abe.resampling_objects. I assume the main bottleneck is, that in each call the entire list is worked through. So my suggestion would be, that in addition to the list, in the abe.resampling_objects a matrix with the coefficients in each resample is saved, i.e. a num.resamples times number of variables +1 for the intercept with the coefficients is saved. Maybe if it increases computation time a 0/1 matrix with included variables, as well. This matrix can be generated once at the end of abe.resampling(). Then future calls to plot.abe() or print.abe() can simply use this preprocessed matrix. It means to work through to all help functions, but I think the work load is not too much. I talked with Gregor, if you think this is okay, he could do it.

biometrician commented 1 year ago

Hi Gregor,

especially for print and plot we have to improve the running time.

According to Rok, the bottleneck is summary which prepares abe.resampling.objects. Is it possible to speed this up? I assume it would help, if these things are done once directly in abe.resampling. Then if print and plot is repeatedly requested this is already done. If done in abe.resampling then we could maybe do it in the foreach loop which would also help.

If saving all results in a matrix instead of the long list improves running time, which I assume, then for the option save.out="minimal" do we still need the list? Or is it enough to just have the information stored in the matrix. If save.out="complete" then we would have the long list plus the matrix.

print, plot and the other helper functions have to be adapted to get the data from the matrix instead of the list.

Since this is a larger change, can you do this change in a separate branch. Then e.g. we can still directly compare running time and size of objects.

Thanks.

gregorsteiner commented 1 year ago

I changed the output of the abe.resampling function. If save.out = "minimal", only a matrix of coefficient values is returned. If save.out = "complete", the model objects are also returned. Next week, I will rewrite the summary and print functions based on this matrix. This should be considerably faster.

gregorsteiner commented 1 year ago

I completely rewrote the summary and print functions. I also made some major changes to plot.abe and pie.abe. They should all be considerably faster now. Since this is a pretty big change, could you check if all of your existing code still works with these changes?

biometrician commented 1 year ago

Thanks a lot, Gregor. I hope I have some time next week to check everything.

rokblagus commented 1 year ago

Gregor, I really appreciate the effort you are putting into this, thanks! I will run a few examples next week.