Closed DilumAluthge closed 6 years ago
I suppose collecting over skipmissing
is slightly better than splatting, but I'm not sure.
ok = .!(ismissing.(x[target]) .| ismissing.(x[score]))
t = BitArray(x[target][ok])
s = collect(skipmissing(x[score][ok]))
TNT(s[t], s[.!t])
Also, I don't thing you should use .*
for boolean types, even though it might boil down to .&
.
It is a pain that you can't infer the type of the array without the Union{Missing, ...}
part. At least, I can't. There is coalesce.()
but I suppose that is similar in performance.
Here’s a thought: we know that the type of x[score]
is Vector{Union{T, Missing}}
where T <: Real.
So the type of s
(once we’ve removed the missing
s) is Vector{T}
where T <: Real
. So, if we want, we could convert(Vector{Cfloat}, s
. And then we’d know the type of s
in advance because we’d be forcing it via convert
.
We could use Float32
or Float64
instead of Cfloat
, but that might be less portable across platforms.
I thought about it, and I think the collect(skipmissing())
might not be that inefficient. It would be nice if we would have had access to the Array{T}
part of the Array{Union{T, Missing}}
---apparently, this is how it is represented in memory. Now the generator loops over the Array{UInt8}
part, and generates every element of the Array{T}
part which is then collected.
If we just carry out the loop ourselves, we could probably save one level of array-building. We might as well just build the tar
and nontar
scores in that one loop...
For now I've just implemented the code above.
There’s a function Missings.T that gives you the T from a Union{T, Missing}: https://github.com/JuliaData/Missings.jl/blob/master/src/Missings.jl
But that would mean we’d have to add Missings.jl as a dependency.
It’s worth thinking about.
In the mean time, could you tag a new release with attobot?
On Wed, Sep 5, 2018 at 10:32 David van Leeuwen notifications@github.com wrote:
I thought about it, and I think the collect(skipmissing()) might not be that inefficient. It would be nice if we would have had access to the Array{T} part of the Array{Union{T, Missing}}---apparently, this is how it is represented in memory https://julialang.org/blog/2018/06/missing. Now the generator loops over the Array{UInt8} part, and generates every element of the Array{T} part which is then collected.
If we just carry out the loop ourselves, we could probably save one level of array-building. We might as well just build the tar and nontar scores in that one loop...
For now I've just implemented the code above.
— You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub https://github.com/davidavdav/ROCAnalysis.jl/pull/9#issuecomment-418752889, or mute the thread https://github.com/notifications/unsubscribe-auth/AFXArZcsNWSedd5IBTrUoE2aknMliQ3Lks5uX-BygaJpZM4WZ--L .
Here’s the other option: we can do Missings.disallowmissing(s[ok]). I think the disallowmissing function will give us the correct type.
On Wed, Sep 5, 2018 at 13:20 Dilum Aluthge notifications@github.com wrote:
There’s a function Missings.T that gives you the T from a Union{T, Missing}: https://github.com/JuliaData/Missings.jl/blob/master/src/Missings.jl
But that would mean we’d have to add Missings.jl as a dependency.
It’s worth thinking about.
In the mean time, could you tag a new release with attobot?
On Wed, Sep 5, 2018 at 10:32 David van Leeuwen notifications@github.com wrote:
I thought about it, and I think the collect(skipmissing()) might not be that inefficient. It would be nice if we would have had access to the Array{T} part of the Array{Union{T, Missing}}---apparently, this is how it is represented in memory <https://julialang.org/blog/2018/06/missing . Now the generator loops over the Array{UInt8} part, and generates every element of the Array{T} part which is then collected.
If we just carry out the loop ourselves, we could probably save one level of array-building. We might as well just build the tar and nontar scores in that one loop...
For now I've just implemented the code above.
— You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub < https://github.com/davidavdav/ROCAnalysis.jl/pull/9#issuecomment-418752889 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AFXArZcsNWSedd5IBTrUoE2aknMliQ3Lks5uX-BygaJpZM4WZ--L
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/davidavdav/ROCAnalysis.jl/pull/9#issuecomment-418811395, or mute the thread https://github.com/notifications/unsubscribe-auth/AFXAra83ys0B_IUJiSLauJlWaC00KS29ks5uYAf5gaJpZM4WZ--L .
Thanks for the links, this inspired me for the function I needed:
found(::Type{Union{T, Missing}}) where T = T
With this we could do a convert(Array{found(scores)}, scores)
, but
of course, I don't know if the convert is any more efficient.
On Wed, Sep 5, 2018 at 7:23 PM Dilum Aluthge notifications@github.com wrote:
Here’s the other option: we can do Missings.disallowmissing(s[ok]). I think the disallowmissing function will give us the correct type.
On Wed, Sep 5, 2018 at 13:20 Dilum Aluthge notifications@github.com wrote:
There’s a function Missings.T that gives you the T from a Union{T, Missing}: https://github.com/JuliaData/Missings.jl/blob/master/src/Missings.jl
But that would mean we’d have to add Missings.jl as a dependency.
It’s worth thinking about.
In the mean time, could you tag a new release with attobot?
On Wed, Sep 5, 2018 at 10:32 David van Leeuwen notifications@github.com wrote:
I thought about it, and I think the collect(skipmissing()) might not be that inefficient. It would be nice if we would have had access to the Array{T} part of the Array{Union{T, Missing}}---apparently, this is how it is represented in memory <https://julialang.org/blog/2018/06/missing . Now the generator loops over the Array{UInt8} part, and generates every element of the Array{T} part which is then collected.
If we just carry out the loop ourselves, we could probably save one level of array-building. We might as well just build the tar and nontar scores in that one loop...
For now I've just implemented the code above.
— You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub < https://github.com/davidavdav/ROCAnalysis.jl/pull/9#issuecomment-418752889 , or mute the thread < https://github.com/notifications/unsubscribe-auth/AFXArZcsNWSedd5IBTrUoE2aknMliQ3Lks5uX-BygaJpZM4WZ--L
.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/davidavdav/ROCAnalysis.jl/pull/9#issuecomment-418811395, or mute the thread https://github.com/notifications/unsubscribe-auth/AFXAra83ys0B_IUJiSLauJlWaC00KS29ks5uYAf5gaJpZM4WZ--L .
— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.
-- David van Leeuwen
This PR:
You will notice that in
src/dataframes.jl
, I use splatting on lines 18 and 19. This was the only way that I could get the correct eltypes fort_complete
ands_complete
. If I usedcollect(...)
, then the eltype still ends up asUnion{T, Missing}
. However, when I use splatting, I get the correct eltypeT
. This is important because the structTNT
expects to be given arguments of typeVector{T}
whereT <: Real
.cc: @davidavdav @diegozea