Compute time is currently quite significant for this project. The issue notably arises from the following factors:
The HAL algorithm -> replace by SAL or simply XGBoost
The number of cross validation iterations (up to 20) -> Early stopping (no CV) with only one algorithm may be enough, however we may also want to keep the linear model in the Super Learner for robustness.
I have also witnessed that the Julia multinomial regression is quite slow when the number of targets increases. Since p(T|W) is typically only scaling linearly with the number of SNPs this may not be a priority to investigate.
Associated current ideas:
[ ] Determine if XGBoost actually comes with a nice rate or not.
[ ] If the previous is false: implement SAL and include in the Super Learner instead of HAL
[ ] Given the current state, this should be enough for this project. If the process is still too slow: consider early stopping only or limit the number cross validation folds
Compute time is currently quite significant for this project. The issue notably arises from the following factors:
Associated current ideas: