EvoArt / PERMANOVA.jl

MIT License
3 stars 1 forks source link

Eltype `Union{Missing, T}` gets treated as categorical? Or something... #4

Closed kescobo closed 2 years ago

kescobo commented 2 years ago

In the following example, x and xm are identical, except that x is Vector{Float64} and xm is Vector{Union{Missing, Float64}}. I remember Makie used to have a similar problem, where the union eltype wouldn't get plotted as continuous.

julia> x = rand(100);

julia> xm = Union{Missing, Float64}[x...];

julia> df = DataFrame(x = x, xm = xm);

julia> y = rand(100, 5);

julia> permanova(df, y, BrayCurtis, @formula(1~x))

         | Df | SumOfSqs |  R²   |   F   |   P
-------------------------------------------------
       x |  1 |    0.014 | 0.002 | 0.211 | 0.897
Residual | 98 |    6.664 | 0.998 |       |
   Total | 99 |    6.678 |     1 |       |

julia> permanova(df, y, BrayCurtis, @formula(1~xm))

         | Df |  SumOfSqs  |  R²   |   F    |   P
----------------------------------------------------
      xm | 99 |      6.678 | 1.000 | -0.000 | 0.995
Residual |  0 | -1.561e-17 | 0.000 |        |
   Total | 99 |      6.678 |     1 |        |
EvoArt commented 2 years ago

Thanks for that. I've patched it up here #5 but to be honest, I need to sit down and have a proper think about the different data types people will be using etc.

julia> x = rand(100);

julia> xm = Union{Missing, Float64}[x...];

julia> df = DataFrame(x = x, xm = xm);

julia> y = rand(100, 5);

julia> permanova(df, y, BrayCurtis, @formula(1~x))

         | Df | SumOfSqs |  R²   |   F   |   P   
-------------------------------------------------
       x |  1 |    0.091 | 0.013 | 1.273 | 0.313
Residual | 98 |    7.028 | 0.987 |       |
   Total | 99 |    7.119 |     1 |       |

julia> permanova(df, y, BrayCurtis, @formula(1~xm))

         | Df | SumOfSqs |  R²   |   F   |   P   
-------------------------------------------------
      xm |  1 |    0.091 | 0.013 | 1.273 | 0.282
Residual | 98 |    7.028 | 0.987 |       |
   Total | 99 |    7.119 |     1 |       |

julia> df.xm[2] = missing
missing

julia> permanova(df, y, BrayCurtis, @formula(1~xm))
┌ Warning: 1 data row(s) dropped due to missing values.
└ @ PERMANOVA C:\Users\arn203\.julia\dev\PerMANOVA\src\perm2.jl:90

         | Df | SumOfSqs |  R²   |   F   |   P   
-------------------------------------------------
      xm |  1 |    0.092 | 0.013 | 1.275 | 0.300
Residual | 97 |    7.000 | 0.987 |       |
   Total | 98 |    7.092 |     1 |       |
kescobo commented 2 years ago

Nice! Also nice to handle the case where there are actual missings, I've been filtering those manually