Closed sx-liu closed 4 months ago
Could you provide more detail about the difference between num_samples
and recursion_depth
? I guess they are r
and t
in https://arxiv.org/pdf/1703.04730.pdf right? if so, maybe we can rename num_samples
-> num_repeat
and explain it as "Specifies the number of repeat of the procedure to average on."
minor issue:
Others LGTM
I see. Thanks for the advice!
@TheaperDeng So how should we deal with hyperparameters, i.e. the damping and scaling factors?
@TheaperDeng So how should we deal with hyperparameters, i.e. the damping and scaling factors?
I think you can currently make these 2 as TODO and fix them to the most common default value
@TheaperDeng Another minor concern. If I understand it correctly, it seems in the former implementations, the ihvp is given by $v \cdot H^{-1}$, which means the product is from the left?
Background
With the
hvp
functions implemented before, further complete the LiSSA algorithm for ihvp calculation. Compared with CG, LiSSA algorithm reduces the number of hvp calculation and is more suitable for large datasets.Algorithm description
LiSSA algorithm approximates the ihvp function by averaging multiple samples. The samples are estimated by recursion based on Taylor expansion.
API Design
The API design basically follows the existing implementations of
hvp
andihvp_cg
. There are also two versions which calculate the hvp for fixed or non-fixed x's.The first input of the LiSSA algorithms is a function for estimation, such as $L(\cdot, \cdot)$. The input list should be a list of the form $[(z_0, \theta), \dots, (z_n,\ theta)]$.
Demonstration