Add GPU support based on (exchangable) structs for solver state

This PR adds GPU support by changing the solver structs to the following pattern:

mutable struct Solver{matT, ...}
  A::matT
  # Other "static" fields
  state::AbstractSolverState{<:Solver}
end

mutable struct SolverState{T, tempT} <: AbstractSolverState{Solver}
  x::tempT
  rho::T
  # ...
  iteration::Int64
end

The trick here is to not fully constrain the state in the solver. This allows us to adapt the state based on the measurements b given to the solve/init! functions.

In this PR we use this to detect if the array type of our preallocated variables are the same as the one of b. If not we adapt our variables with similar, store the new state and initialize as usual. This also doesn't require any dependency on CUDA.

First tests have shown that the actual iteration method is as type-stable as it was before. I am seeing small performance regressions, however those seem to be in the init! and solve method and seem to stem from my setup having to handle both "old" and "new" variants.

ToDos:

[x] CGNR
[x] FISTA
[x] POGM
[x] OptISTA
[x] ADMM
[x] (Kaczmarz)
[x] Direct Solvers
[x] Update Callbacks
[x] Update Proximal Maps
[x] Update Tests for GPU Support

Future options for this setup:

We could also use different states per solver to encode variantes of an algorithm. Randomized Kaczmarz or FISTA with restart might get different types. A danger here is to add a lot of maintenance cost if the number of variants explodes.

We can use the States to implement "multiframe" measurements where b is a matrix. Then we need to allocate a State with a matrix of preallocated variables (if a solver supports this directly. Otherwise we can just loop over each measurement as a fallback. For this I would also introduce a trait for Single-/Multiframe Solvers).

We can still use package extensions to write specialized versions of the iteration functions which are then conditionally loaded. We could also wrap common array operations of our solvers into "our" own function and then just specialize those for combinations of CUDA, Sparse and Dense as we did it for Kaczmarz

JuliaImageRecon / RegularizedLeastSquares.jl

Add GPU support based on (exchangable) structs for solver state #81