Open gottacatchenall opened 3 years ago
Merging #132 (f2b8627) into main (b3b0f32) will increase coverage by
0.95%
. The diff coverage is55.04%
.
@@ Coverage Diff @@
## main #132 +/- ##
==========================================
+ Coverage 54.13% 55.08% +0.95%
==========================================
Files 35 36 +1
Lines 1101 1120 +19
==========================================
+ Hits 596 617 +21
+ Misses 505 503 -2
Flag | Coverage Δ | |
---|---|---|
unittests | 55.08% <55.04%> (+0.95%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
src/datasets/earthenv/topography.jl | 0.00% <0.00%> (ø) |
|
src/datasets/layernames.jl | 0.00% <0.00%> (ø) |
|
src/datasets/worldclim/elevation.jl | 0.00% <0.00%> (ø) |
|
src/integrations/DataFrames.jl | 0.00% <0.00%> (ø) |
|
src/operations/mask.jl | 0.00% <0.00%> (ø) |
|
src/operations/sliding.jl | 0.00% <0.00%> (ø) |
|
src/pseudoabsences/radius.jl | 0.00% <0.00%> (ø) |
|
src/pseudoabsences/randomselection.jl | 0.00% <0.00%> (ø) |
|
src/pseudoabsences/surfacerangeenvelope.jl | 0.00% <0.00%> (ø) |
|
src/recipes/recipes.jl | 12.90% <12.90%> (-0.17%) |
:arrow_down: |
... and 27 more |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 2e5dc79...f2b8627. Read the comment docs.
Can I be a nuisance? I'd like it a lot if we could do something like
W = fit(PCA, stack_of_layers)
smaller_stack_of_layers = transform(W, stack_of_layers)
In some cases, we don't want to reduce the dimensionality as much as we would like to e.g. whiten the data. I'd drop that in a file loaded when MultivariateStats
is loaded, using Requires
.
This makes me realize that we may want to have a new type, like SimpleSDMStack
, to store layers that pass the _layers_are_compatible
test. This is one we can use in mosaic
(even though mosaic works when the overlap is incomplete), but would definitely work here. If we want to be smart about it, we can also have a 3d matrix and an array of names, and from this we extract the layers as required. I can suggest something in another PR.
overloading fit
and transform
should be easy enough
@tpoisot mostly there, current code works for PCA
and PPCA
but throws errors for KernelPCA
and Whitening
, although those could be a product of using randomly generated matrices without correlation in the current tests.
also given the function call structure it requires computing the input matrix in both fit
and transform
, might start to cause performance concerns for big/many layers
do you want to merge this into a different branch to implement SDMStacks
?
I don't think this needs to have stacks yet - but we can merge, and add the stacks later?
just pushed docs and a type assert for PCA
/PPCA
for now, do you merge into a dev
branch before merging to main and tagging? @tpoisot
So I don't think a type assert is required - Whitening fails sometimes because it requires a specific matrix type.
I'll review and merge/tag, there's no dev branch.
removed the type assert
Can you add a documentation update?
there's a _make_input
method that doesn't look defined
looks like i dropped that when shifting to the common_keys = reduce(∩, keys.(layers))
call but didn't add it to fit
, should work now
Any reason why the plots here https://docs.ecojulia.org/SimpleSDMLayers.jl/previews/PR132/examples/pca/ have values in 0-1?
right now transform
returns map(f->rescale(f, (0,1)), newlayers)
. primarily did this to emphasize output dimensions are unitless but MVStats doesn't do this so might make sense to remove it
also fit(Whitening, layers)
is still failing with non-posdef matrix on the same input data as the doc example
Yeah let's not do the rescale, because we might want to backproject at some point
also
fit(Whitening, layers)
is still failing with non-posdef matrix on the same input data as the doc example
this probably "just" need a little explanation that sometimes Whitening
giveth, and sometimes it taketh away
fixed the whitening example (i think), needed a slightly different overload of transform
actuatlly i might be overcomplicating it a bit
The implementation of transform
could be one function but runs into ambiguity with
transform(f::Whitening, x::AbstractVecOrMat{T} where T) in MultivariateStats
transform(proj::T, layers::AbstractVecOrMat{U}, kwargs...) where {T, U<:SimpleSDMLayer} in SimpleSDMLayers
now implemented via a single function call, the only irritant being in the case that if SimpleSDMLayers
and MultivariateStats
are both in the namespace, it requires explicit usage of SDM.transform
/MV.transform
now implemented via a single function call, the only irritant being in the case that if
SimpleSDMLayers
andMultivariateStats
are both in the namespace, it requires explicit usage ofSDM.transform
/MV.transform
Wait, there's a transform
in SDM?
not at the moment, but implementing a transform
method within the SDM
namespace (only in the case MVStats is also loaded) is a way to avoid the method ambiguity issue above (which in the long term could probably be solved by a PR within MVStats.jl)
edit: no need to export if it is dependent on MVStats being loaded, it just requires SDMs.transform()
which isn't ideal
essentially this is a problem with MVStats.jl
dispatch though
In the long term I think the julianesque solution is for MVStats.jl to define all its (fit
, transform
) methods on different models (pca, ppca, whitening) around an a single method dispatched on a single abstract type, as opposed to their current definitions which are fit(::PCA, matrix) = ...
, fit(::Whitening, matrix) = ...
, etc.
So can't we temporarily solve this with generated code?
macros are probably the feature of julia i understand the least, but if that's a neat way to resolve the transform
function call ambiguity i'm all ears
Let me try
OK so I don't think it needed the code generation - I'll check the example page before releasing
yeah it runs into the same dispatch ambiguity without defining a transform
in the SDM namespace
there's also an inconsistency in how transform
is defined across different mv transforms in MVStats.jl
which should probably get raised as an issue there
seems there is an MVStats.jl issue about exactly this issue, not sure how active MVStats.jl development is
Mhhhh. Running test doesn't show a method error on my side, are you sure the test suite is complete?
i think it doesn't fail because i use newlayers
for both plots in the docs src. but clearly doesn't work because the plot under the whitening example in the docs is the same as the above plot and should only be 2 maps not 4
running locally gives me the same dispatch ambiguity
and i should 100% add a test of Whitening
, the issue is how to get a predefined input that it doesn't randomly fail on w/ pos-def matrix problem
Visited this again, still can't find a simple way around this ambiguity
MethodError: transform(::MultivariateStats.Whitening{Float64}, ::Vector{SimpleSDMResponse{Float64}}) is ambiguous. Candidates:
transform(f::MultivariateStats.Whitening, x::AbstractVecOrMat) in MultivariateStats at /home/michael/.julia/packages/MultivariateStats/HTpHt/src/whiten.jl:38
transform(proj::P, layers::AbstractVecOrMat{U}; kwargs...) where {U<:SimpleSDMLayer, P<:Union{MultivariateStats.PCA, MultivariateStats.Whitening}} in SimpleSDMLayers at /home/michael/code/SimpleSDMLayers.jl/src/integrations/MultivariateStats.jl:26
Possible fix, define
transform(::P, ::AbstractVecOrMat{U}) where {U<:SimpleSDMLayer, P<:MultivariateStats.Whitening}
The possible fix from the error message is implemented but doesn't actually fix the problem
What the pull request does
Implements methods for taking a set of SDMLayers and applying multivariate transformations from
JuliaStats/MultivariateStats.jl
to do a variety of things: reduce the dimensionality of a data set (PCA, PPCA, KernelPCA), make layers without correlation (Whitening), or the other methods mvstats provides Type of changePlease indicate the relevant option(s)
Checklist
.zenodo.json
Project.toml
fieldversion
has been updatedv0.0.x
series release, a bug fix, or a performance improvement