TidierOrg / TidierData.jl

Tidier data transformations in Julia, modeled after the dplyr/tidyr R packages.
MIT License
86 stars 7 forks source link

clean_names does not produce the expected names for some columns #25

Closed durraniu closed 1 year ago

durraniu commented 1 year ago

Three column names in my dataframe are Vehicle_ID, Frame_ID, and Lane_ID. When I use @clean_names, other columns are formatted just like R's janitor::clean_names(), but the aforementioned columns are formatted as vehicle_i_d, frame_i_d, and lane_i_d. The data I read is in a parquet file. I used the following code:

using ParquetFiles, DataFrames, Tidier

df = DataFrame(load("data/df_raw.parquet"))

df = @chain df begin
      @clean_names
     end

julia> first(df)
DataFrameRow
 Row │ vehicle_i_d  frame_i_d  total_frames  global_time    local_x  local_y  global_x   globa ⋯
     │ Int64        Int64      Int64         Int64          Float64  Float64  Float64    Float ⋯
─────┼──────────────────────────────────────────────────────────────────────────────────────────
   1 │           1         12           884  1113433136100   16.884  52.7478  6.04284e6  2.133 ⋯
                                                                              11 columns omitted
kdpsingh commented 1 year ago

Thanks for catching this. We are wrapping the polish_names() function from Cleaner.jl package. I would suggest filing an issue there: https://github.com/TheRoniOne/Cleaner.jl/tree/master

We can certainly try to re-implement this from scratch or modifying that code but would check and see if they are interested to fix. Thanks.

durraniu commented 1 year ago

Thank you.

kdpsingh commented 1 year ago

I asked on the Cleaner.jl repository. We may even be able to help fix this but want to make sure the fix lives in Cleaner.jl if possible.

kdpsingh commented 1 year ago

Per the author, the Cleaner.jl issue is fixed! There's a patch released on the registry.

I just need to update the dependency version for Cleaner within TidierData for this to work. Will get that done shortly.

I'll leave this issue open until that's done on our end.

kdpsingh commented 1 year ago

I realized there is a dependency mismatch between DataFrames.jl v1.5+ (which TidierData.jl depends on) and the latest version of Cleaner.jl.

I need to wait for this issue to be resolved (https://github.com/TheRoniOne/Cleaner.jl/issues/6) before I can update TidierData.jl to take advantage of this.

kdpsingh commented 1 year ago

This is now fixed in Cleaner.jl v1.0.3, which is now on the registry. Closing the issue.

If you want to see it take effect, feel free to update the Cleaner package (and make sure TidierData is also up-to-date). I'm going to leave TidierData as compatible with older versions of Cleaner just to allow users flexibility.