TidierOrg / Tidier.jl

Meta-package for data analysis in Julia, modeled after the R tidyverse.
MIT License
515 stars 14 forks source link

Need docs on type conversion #48

Closed kdpsingh closed 1 year ago

kdpsingh commented 1 year ago

Per this tweet (https://twitter.com/prigithj/status/1634109778757820418), it would be good for us to have some examples in the docs of how to convert types (eg string to number and vice versa) using @mutate. We need to consider composite types and relatedly, how missing values are handled during type conversion.

zhezhaozz commented 1 year ago

I don't think Julia supports the string to number conversion (see this). We need to achieve this using string parsing. Will think about an example and comment on this issue.

kdpsingh commented 1 year ago

It does, you just have to use parse() as opposed to convert(). I think it just works differently than in R, so some thin wrappers around as_character(), as_numeric(), and as_integer() may be helpful.

danfulop commented 1 year ago

I'm interested in pitching in, but am a total newb to Julia. Caveat: I may not have time until Q2 begins in ~1.5 weeks.

kdpsingh commented 1 year ago

Thanks @danfulop. We would love your help! No worries regarding time.

Two quick questions that may help orient you to the best way to contribute:

  1. Are you experienced with either R or Python?
  2. Have you submitted a pull request to GitHub (or know how that works)?

Don't worry if the answer is no to question 2. Just want to support you through the process as you consider working on this.

danfulop commented 1 year ago

@kdpsingh The answers to those are yes and yes. I'm just new to Julia.

kdpsingh commented 1 year ago

Perfect. Let's aim to create 3 functions for now:

These functions should work on a single value provided to them. Don't try to make them work on vectors -- Tidier.jl will auto-vectorize them when they are called.

Take a look at the Julia language manual on strings and on type conversion/promotion.

When we test them, they should be able to take any of these 3 types and convert to any of the others using the above functions.

Good luck on this, and keep us posted!

alonsoC1s commented 1 year ago

@kdpsingh I think I can take a crack at this. Where would these as_... functions fit in the source files?

kdpsingh commented 1 year ago

The new functions can go in a new type_conversion.jl file. Would reference it in the includes at the top of the main Tidier.jl file.

@danfulop, is it okay if @alonsoC1s works on this? I know you mentioned you would have time in a couple of weeks, but if you've already started, would be good to know. Want to ensure there is no duplicate work if possible (as there is lots to go around!).

I may have forgotten to mention in my earlier message, but all 3 of the functions should return missing values if they encounter a missing value -- they shouldn't produce an error.

danfulop commented 1 year ago

@kdpsingh @alonsoC1s Yes, that's totally fine. I wouldn't have time until next week.

kdpsingh commented 1 year ago

@alonsoC1s feel free to take a look! Will "assign" this issue to you on GitHub but no pressure 😀

alonsoC1s commented 1 year ago

Just so we're all on the same page, the final implementation should work like:

@chain movies begin
    @mutate(Budget = as_integer(Budget))
end

Am I missing something?

kdpsingh commented 1 year ago

Exactly. The only thing to know is that when you run it inside of Tidier.jl, as_integer() will get automatically vectorized into as_integer.(), so you only need to get the function to work on a single scalar value and not on a vector.

In other words, focus on getting this to work:

as_integer(2.0) and as_integer("2")

and NOT as_integer([1.0, 2.0]).

Let me know if that doesn't make sense. Thank you again.

kdpsingh commented 1 year ago

and if someone enters as_integer("hello"), it should return a missing and not an error.

alonsoC1s commented 1 year ago

@kdpsingh I just made a new PR. Let me know your thoughts. I took the liberty of adding some tests

kdpsingh commented 1 year ago

Awesome thank you! 🙏

I will review and take it from here. I'll let you know if I have questions.

kdpsingh commented 1 year ago

The new as_float(), as_integer(), and as_string() functions are now implemented in #83. They are documented in dostrings - will consider adding a dedicated documentation page for these functions in the future.