cursorinsight / FeatureScreeningDemo.jl

A Julia package demonstrating the use of FeatureScreening.jl.
https://www.cursorinsight.com
MIT License
0 stars 0 forks source link

Code review of 8b9b8c0 #1

Closed dhanak closed 2 years ago

dhanak commented 3 years ago

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Utilities.jl#L20

The underscore prefix is not very fortunate (it already means something in the conventions). Consider renaming to date_format or something similar. (Same with _print_json and _parse_json below.)

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Utilities.jl#L46-L48

Unless performance is really an issue, I would stick with the NamedTuple(keys .=> values) notation instead of extending the constructor with an unconventional method. If performance is an issue, your version is much faster.

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Utilities.jl#L64

Odd naming choice, perhaps jumbled? __to_be_exposed, perhaps? Also, why two underscores?

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Utilities.jl#L129

Again, why the double underscore? Even though there is no outright convention for it, the de-facto convention seems to be a single underscore for private members.

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Utilities.jl#L189-L192

More succinct with broadcast:

return append!.(acc, split(idxs.(group); kwargs...))

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Benchmarking.jl#L32-L34

Just sayin', this is a misuse of uuid5. The first argument should be the namespace that shouldn't be arbitrary, the second is the entity you wish to generate the UUID from. See https://docs.python.org/3/library/uuid.html#uuid.uuid5 and also https://stackoverflow.com/questions/10867405/generating-v5-uuid-what-is-name-and-namespace. This code works, but it is not how UUID5 was intended to be used.

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Benchmarking.jl#L78-L79

Just a note. I also encountered the problem of converting a struct into a NamedTuple before being able to serialize, and I came up with this function:

"""
    namedtuple(value)::NamedTuple

Convert a composite `value` to a named tuple.
"""
@generated function namedtuple(x)
    if x <: AbstractDict || x <: NamedTuple
        return :(NamedTuple(x))
    elseif isstructtype(x)
        fields = [:($fn = getfield(x, $(QuoteNode(fn))))
                  for fn in fieldnames(x)]
        return Expr(:tuple, fields...)
    else
        error("Expecting AbstractDict or struct type!")
    end
end

With this, the above line could be written like this (albeit it will not use the generated getter functions in this case):

print_json(path, namedtuple(measurement))

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Benchmarking.jl#L106-L108

I would probably consider using Base.@kwdef here because of the large number of arguments.

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Benchmarking.jl#L157

There is no need to splat size(configs), the constructor of Array also accepts tuples.

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Benchmarking.jl#L165-L166

Simpler (and less explicit on the indexing):

@showprogress "Benchmark $(description)" for (i, config) in pairs(configs)

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Visualization.jl#L40

This is just the following, right?

y = y = vec(features(feature_set[idxs, :]))

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/Visualization.jl#L32-L36

Just a note. This got me thinking, it has to be simpler than that. First, I slightly generalized the group_by function with a mapping function argument as follows (also note the removal of the unnecessarily tight type restriction on itr):

using Base: return_types

function group_by(by::Function, f::Function, itr)::OrderedDict
    T = only(return_types(f, (eltype(itr),)))
    return foldl(itr; init = OrderedDict{Any, Vector{T}}()) do acc, x
        push!(get!(acc, by(x), T[]), f(x))
        return acc
    end
end

function group_by(by::Function, itr)::OrderedDict
    return group_by(by, identity, itr)
end

function group_by(itr)
    return group_by(identity, itr)
end

With this adjustment, the above foldl call can be replaced with the following, very simple and straightforward call:

idxs = group_by(last, first, enumerate(labels(feature_set)))

Neat, huh?

https://github.com/cursorinsight/FeatureScreeningDemo.jl/blob/8b9b8c018de5a103143c2350fdb57334c57f6920/src/CommandLine.jl#L91

Less loquacious equivalent:

using Base.Iterators: peel

(cmd, args) = peel(arguments)

I like this Command structure and interface, btw! Neat!

tortelio commented 2 years ago

Named tuple example

  # this is working
  x = (; b = 123)
  (; a = 1, x.b) == (a = 1, b = 123)
  # this could
  (; a = 1, first(1:10)) == (a = 1, first = 1)
tortelio commented 2 years ago

@dhanak Are we ready with with these?

dhanak commented 2 years ago

Certainly seems so.