Open paraynaud opened 1 year ago
norm
may benefit it too.
@dpo, Should I implement this before the submission of the article?
I started the implementation. For the first case, here is the gain:
N = 1500
n = 20000
nie = 15
element_variables = map((i -> sample(1:n, nie, replace = false)), 1:N)
pv = PartitionedVector(element_variables; n)
pv_vec = PartitionedVector(element_variables; n, simulate_vector=true)
pv_vec .= 1
dot(pv, pv_vec; warn=false)
@benchmark dot(pv, pv_vec; warn=false)
# main branch
# BenchmarkTools.Trial: 3389 samples with 1 evaluation.
# Range (min … max): 873.900 μs … 3.272 ms ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 1.371 ms ┊ GC (median): 0.00%
# Time (mean ± σ): 1.458 ms ± 357.329 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
# ▁▃███▆▅▅▄▂▃▂▁▂▂▂ ▁▁
# ▁▁▂▂▃▆█████████████████████▆▆▅▆▅▅▄▄▄▄▄▃▄▃▃▃▂▃▃▂▃▂▂▂▂▂▂▂▂▂▂▂▂▁ ▄
# 874 μs Histogram: frequency by time 2.55 ms <
# Memory estimate: 16 bytes, allocs estimate: 1.
# pr-partitioned-dot PR
# BenchmarkTools.Trial: 10000 samples with 1 evaluation.
# Range (min … max): 34.300 μs … 330.100 μs ┊ GC (min … max): 0.00% … 0.00%
# Time (median): 37.500 μs ┊ GC (median): 0.00%
# Time (mean ± σ): 45.281 μs ± 18.760 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
# ██▇▆▂▁▁ ▁▅▂▄ ▂▂ ▁ ▁ ▄ ▂▂ ▂ ▁ ▁ ▃▂ ▂
# ███████████████▇▆█▆███████▇██▆▆██▆▆▃▇▆█▆▆▄▆█████▇▅▆▅▆▄▄▃▄▄▅▄ █
# 34.3 μs Histogram: log(frequency) by time 109 μs <
# Memory estimate: 48 bytes, allocs estimate: 2.
It will be use in trunk
to evaluate the quadratic model $m_k$ in s
(dot(g,s)
).
At some point, we will have to make an heuristic about the partitioned structure to know which method use.
In the case where the size of all elements is close to n
, then the new dot
will perform a N
dot of size n
.
In the case where the size of all elements is close to n, then the new dot will perform a N dot of size n.
If any element has size close to n, we should not use the PSS at all.
Do you mean the user should not apply a PartiallySeparableNLPModel
, or the PartiallySeparableNLPModel
s should automatically consider only one function (e.g. the objective function) gathering every element functions?
In the second case, we fall back on standard QuasiNewtonNLPModel
.
I think the choice is not obvious, because limited-memory partitioned quasi-Newton operators keep an interest when the size of element functions in close to n
.
We've been talking for a long time about a heuristic to determine which elements should be merged, I think that's one of the next features I should implement. I could adapt the merge depending the partitioned model considered.
But if there are at least 2 elements and one involves almost all n variables, it will almost surely be more expensive to manage the PSS than to ignore it. It's worth benchmarking a few examples to be sure.
Merging is a somewhat separate topic.
Specialize dot product when
pv1.simulate_vector != pv2.simulate_vector
.Suppose $w$ a Vector represented by a PartitionedVector (storing $Ui w$) and $$v = \sum{i=1}^N U_i^\top \hat{v}_i$$ then
$$ v^\top w = (\sum_{i=1}^N U_i^\top \widehat{v}_i)^\top w = \sum_{i=1}^N \widehat{v}_i^\top U_i w,$$
which accumulates element scalar products.
If both have
simulate_vector=true
we should avoid to build the Vector. We should add a flag field saying if the Vector is build or not.If both have
simulate_vector=false
we can try something more complex:$$ \left (\sum_{i=1}^N Ui^\top \hat{v}_i \right)^\top \left (\sum{i=1}^N Ui^\top \hat{w}_i \right) = \left (\sum\{i=1}^N \hat{v}_i^\top Ui \right) \left (\sum{i=1}^N Ui^\top \hat{w}_i \right) = \sum{i=1}^N \sum_{j=1}^N \hat{v}_i^\top U_i U_j^\top \hat{w}_j $$
Even if there is $N^2$ terms, a majority of them should vanish because $U_i U_j^\top=0$ if there is no common variables between elements. After identifying those key pairs, we will have to access efficiently the value of their element components.