Refactor noncyclical fitting code

boost-R / gamboostLSS

Boosting models for fitting generalized additive models for location, shape and scale (GAMLSS) to potentially high dimensional data. The current relase version can be found on CRAN (https://cran.r-project.org/package=gamboostLSS).

26 stars 11 forks source link

Refactor noncyclical fitting code #31

Closed ja-thomas closed 7 years ago

ja-thomas commented 7 years ago

Start of the refactoring for the noncylical fitting to make it better readable, easier to maintain and hopefully easier to fix #26 #22 and #21

We switch back to only one iBoost function with an additional method argument

Status:

[x] Remove outer fitting
[x] Fix tests and further usage where "outer" was present
[x] Rename stuff to "cyclic" and "noncyclic" and fix documentation
[x] #27 add references to tech report
[x] #26 Speed improvements -> I changed some queries to environments to evalq statements. That might do the trick, but I still have to benchmark how much it actually improved for non-toy examples
[x] #24 Duplicate less code -> only one iBoost function.
[x] #22 ~~still no idea how to implement this...~~ Already looks pretty good, maybe done :)
[x] Cleanup: fix broken tests, examples and everything else
[x] Fix trace
[x] Add JT as author
[x] Check that CV works with mstop=0
[x] ...

ja-thomas commented 7 years ago

The main problem is that the code for the outer method is still pretty terrible because mboost just isn't made to be updated based on a different criterion (e.g. not on the L2 loss on the gradient, but on the real loss function...). I don't really see a way to make that really nice.

Maybe we should move the outer version to an experimental branch and recommend the usage of the inner version which we can release soon. Not the ideal solution but maybe the most stable.

@hofnerb what do you think?

coveralls commented 7 years ago

Coverage decreased (-0.4%) to 80.813% when pulling c865ee7caad15c035b4d52ddb40bb61ffd44f6c9 on refactor_noncyc into 0d94a467398e0964662ac7c82f2e58b64f5f1bc9 on devel.

hofnerb commented 7 years ago

Puh, tough question. In principle ok (as we prefer the inner version anyway for computational reasons; Am I right?).

However, one remaining problem would be that we cannot update the simulations for both the inner and the outer version with this solution.

Do you see any chance to include the outer version in the current code (in the near future)?
Might it be an option to change the code such that we only provide a cyclic and one non-cyclic function (i.e., the inner version) forever? This might require some changes in the paper (i.e. we would need to state that we considered the outer version but do not further pursue this version).

ja-thomas commented 7 years ago

Puh, tough question. In principle ok (as we prefer the inner version anyway for computational reasons).

However, one remaining problem would be that we cannot update the simulations for both the inner and the outer version with this solution.

This is indeed a problem.

Do you see any chance to include the outer version in the current code (in the near future)?

So in this PR the outer version is still present, but it is quite ugly/hacky.

Might it be an option to change the code such that we only provide a cyclic and one non-cyclic function (i.e., the inner version) forever? This might require some changes in the paper (i.e. we would need to state that we considered the outer version but do not further pursue this version).

I would actually prefer that solution. We can still keep the outer version in an experimental version. The changes in the paper shouldn't be too large as the stabsel benchmarks use the inner version anyways.

coveralls commented 7 years ago

Changes Unknown when pulling 5b5190d7256f57a85feddd62959648f77fee271d on refactor_noncyc into on devel.

ja-thomas commented 7 years ago

Travis passed :tada::tada::tada::tada:

ja-thomas commented 7 years ago

@hofnerb do you have any idea why appveyor is failing?

The test is running on my local machine and Travis is passing as well.

(appveyor in the master is currently failing as well, but that seems to be a different error)

hofnerb commented 7 years ago

I think (and what I've seen from the appveyor config) that appveyor checks with mboost from CRAN. Can you run the check locally with mboost/CRAN and see if you get the same errors? If so, please go ahead :)

[but please see that you fix the open issues (again) such as non-visible bindings, to wide manual pages, ...; this can also be done AFTER merging]

coveralls commented 7 years ago

Changes Unknown when pulling de4be0dec1c1e9b0fe7205ad9feb7521cb01a73b on refactor_noncyc into on devel.

hofnerb commented 7 years ago

All checks are successful now. There is one open issue left:

checking R code for possible problems ... NOTE mboostLSS_fit : iBoost: no visible global function definition for 'ngradient' mboostLSS_fit : iBoost: no visible binding for global variable 'y' mboostLSS_fit : : no visible binding for '<<-' assignment to 'combined_risk' mboostLSS_fit : : no visible binding for global variable 'combined_risk' Undefined global functions or variables: combined_risk ngradient y

I think this is a consequence of evalq() within the code. Does this really help fixing speed issues? If so, how could we get rid of the NOTE in a sensible way? Can you please investigate that, @ja-thomas?

Please

coveralls commented 7 years ago

Changes Unknown when pulling 86ebeb2a55e37c727d7443a192895bb313d15e87 on refactor_noncyc into on devel.

coveralls commented 7 years ago

Changes Unknown when pulling 86ebeb2a55e37c727d7443a192895bb313d15e87 on refactor_noncyc into on devel.

ja-thomas commented 7 years ago

I think this is a consequence of evalq() within the code. Does this really help fixing speed issues? If so, how could we get rid of the NOTE in a sensible way? Can you please investigate that, @ja-thomas?

Hm, in my benchmarks evalq seemed slightly faster. The biggest advantage in my point of view that the code is much more readable with evalq instead of accessing the environments by hand all the time.

EDIT: A call to global_variables() in AAA.R helps to resolves almost all Notes (see: http://stackoverflow.com/questions/9439256/how-can-i-handle-r-cmd-check-no-visible-binding-for-global-variable-notes-when) One note is still the global assignment for combined_risk

coveralls commented 7 years ago

Changes Unknown when pulling 4c2d159f4ecfbddba851eda3e85a3ab67d42e1f0 on refactor_noncyc into on devel.

coveralls commented 7 years ago

Changes Unknown when pulling 6309410fc4538f1a4bac3e15c673185bc9c78d06 on refactor_noncyc into on devel.