Bug in the aggregation of standard errors from repeated cross-fitting

I think there is a bug in the aggregation of standard errors from repeated cross-fitting.

The aggregation formula stated in Chernozhukov et al. (2018) is $formula1$ })
Note that we also state the same here in the user guide: https://docs.doubleml.org/stable/guide/resampling.html#repeated-cross-fitting-with-k-folds-and-m-repetition
For the implementation it is important to point out that attribute(s) se are not equal to sigma_hat but the scaled asymptotic standard error, i.e., $formula2$
The same also applies to the standard errors from the repeated splits _all_se, i.e., $formula3$
Therefore, the correct formula for aggregating the asymptotic / scaled standard errors is $formula4$ }{N}})

We don't seem to have unit tests being sensitive for the bug fix in the aggregation formula. In my ongoing major update of the unit test framework I will add this extension.
In our R vs. Python package tests we so far didn't had a test case with repeated cross-fitting and therefore the difference between the implementations didn't become visible: I added such a test case in https://github.com/DoubleML/doubleml-py-vs-r/pull/4. As the tests are now sensitive for the aggregation formula, they also fail in the PR which will be resolved when the R package got its bug fix.

The user guide (https://docs.doubleml.org/stable/guide/resampling.html#repeated-cross-fitting-with-k-folds-and-m-repetition) is already quite precise in this regard (see screenshot below). However, I would change one small thing: In attribute _all_se we don't store the unscaled standard errors sigma_hat_m but the scaled / asymptotic standard errors sigma_hat_m / sqrt(N). We should adapt this accordingly.

DoubleML / doubleml-for-r