b-rodrigues / fp_in_R_discussion

3 stars 0 forks source link

FP in R Discussion #1

Open armcn opened 2 years ago

armcn commented 2 years ago

Hi everyone,

This can be about general FP topics in R but we can start by continuing the maybe discussion.

I have a few use-cases for maybe:

  1. Data IO
  2. Handling "impure" data
  3. Functions with undefined behaviour

Examples:

Data IO

io <- function(.f) {
  \(...) 
    tryCatch(
      just(.f(...)),
      warning = \(w) nothing(),
      error = \(e) nothing()
    )
}

safe_read_csv <- io(read_csv)

Handling "impure" data

safe_sum <- function(...) {
  vec <- c(...)
  is_empty <- length(vec) == 0L
  any_na <- any(is.na(vec))

  if (is_empty || any_na)
    nothing()

  else
    just(sum(...))
}

Functions with undefined behaviour

head <- function(a) {
  if (length(a) == 0L) 
    nothing()

  else 
    just(a[[1]])
}
armcn commented 2 years ago

Added to the maybe function. I think generating functions which produce maybes like this could be useful.

> maybe(mean)(1:10)
Just
[1] 5.5
> maybe(mean)("1")
Nothing
> maybe(mean, assert = is.integer)(1:10)
Nothing
> maybe(sqrt)(1)
Just
[1] 1
> maybe(sqrt)(-1)
Nothing
maybe <- function(.f,
                  allow_warning = FALSE,
                  allow_empty_vector = FALSE,
                  allow_empty_dataframe = FALSE,
                  assert_result = \(a) TRUE) {
  \(...) {
    on_warning <-
      \(w)
        if (allow_warning)
          .f(...)

        else
          nothing()

    on_error <-
      \(e) nothing()

    is_empty_vector <-
      \(a) length(a) == 0L

    is_empty_dataframe <-
      \(a) is.data.frame(a) && nrow(a) == 0L

    is_disallowed_empty_vector <-
      \(a)
        !allow_empty_vector &&
        is_empty_vector(a)

    is_disallowed_empty_dataframe <-
      \(a)
        !allow_empty_dataframe &&
        is_empty_dataframe(a)

    is_undefined <-
      \(a)
        is.null(a) ||
        is.na(a) ||
        is.nan(a) ||
        is.infinite(a)

    eval_f <-
      \(...) {
        result <-
          .f(...)

        assertion_failed <-
          \(a) !isTRUE(assert_result(a))

        if (is_undefined(result) ||
            is_disallowed_empty_vector(result) ||
            is_disallowed_empty_dataframe(result) ||
            assertion_failed(result))
          nothing()

        else
          just(result)
      }

    tryCatch(
      eval_f(...),
      error = on_error,
      warning = on_warning
    )
  }
}
b-rodrigues commented 2 years ago

Cool, I’ve just tried it by running code from your repo: https://github.com/armcn/maybe/blob/main/R/maybe.R

Creating values also works nicely

maybe(identity)(4)

How could something like this work:

mtcars %>%
  safe_filter(am == 1) %>%
  safe_select(am, cyl) %>%
  from_maybe(default = 1)

would that use the function you showed on twitter? and_then() and maybe_map?

b-rodrigues commented 2 years ago

Also, regarding @Kupac comment here https://twitter.com/kupac/status/1482988381185449985?s=20

you mean safely() could provide something like maybe()? Something like this wouldn’t work:

safe_select <- safely(select, otherwise = NULL)
safe_filter <- safely(filter, otherwise = NULL)
mtcars %>%
 safe_select(am, cyl) %>%
 safe_filter(am == 1)  

because safe_filter() can’t handle a list, returned by safe_select(). So by using safely() we could only need to provide some way for the functions to handle lists (or provide a pipe that unwraps lists). Something like this (inspired by your bind function from your blog post):

`%>=%` <- function(ma, f) {
  if(!is.null(ma$error)) {
    return(ma$error) 
  } else {

    func <- deparse(substitute(f)) 
    cmd <- paste0("ma$result %>% ", func) 
    res <- eval(parse(text=cmd)) 

  }
  res
}

mtcars %>% 
  safe_select(am, cyl) %>=% #notice the pipe here
  safe_filter(am == 1)  

or using something like and_then() as to avoid to introduce a new pipe, which I think users wouldn’t like.

armcn commented 2 years ago

This:

mtcars %>%
  safe_filter(am == 1) %>%
  safe_select(am, cyl) %>%
  from_maybe(default = 1)

could be either of these:

mtcars |> 
  maybe(filter)(am == 1) |> 
  and_then(maybe(select), am, cyl) |> 
  from_maybe(default = 1)

safe_filter <- maybe(filter)
safe_select <- maybe(select)
mtcars |> 
  safe_filter(am == 1) |> 
  and_then(safe_select, am, cyl) |> 
  from_maybe(default = 1)

You will need to pull changes from the maybe repo for these examples to work

armcn commented 2 years ago

safely could be used. I think a good approach is to design it to be used with tidyverse tools but also provide solutions. So safely works but creating a custom maybe function works more smoothly with maybe values

armcn commented 2 years ago

A new pipe does make things more elegant, and it could be provided as an option, but I think providing functions like and_then that work with the pipes everyone knows might be more intuitive. Also a limitation of a new pipe is it has to be imported if people use it in packages.

armcn commented 2 years ago

I think the assert parameter could be useful. For example, I often have the situation where I read in data, assert characteristics of it, then do something if it fails. So a contrived example:

read_csv("covid_cases.csv") |> 
  maybe(process_data, assert = is_expected_data)() |> 
  and_then(maybe(calc_total_cases), country == "Canada") |> 
  from_maybe(default = "No Data Available")
Kupac commented 2 years ago

I really like your maybe function, it covers many possible use cases. I thought that empty vector an empty dataframe is a very common use case, and you already included it, along with a lot of other possibilities.

is_undefined <- \(a) is.null(a) || is.na(a) || is.nan(a) || is.infinite(a)

This made me think: atomic vectors are actually already Maybe types, right? I mean _NA_character is not a character string, but the equivalent of a Nothing value. The same goes for _NA_integer, and NaN as well. So numeric vectors have two ways to express nothing !

Inf is a bit trickier. It's true that it's not a number per se, but it should be considered in rankings and some functions (median, quartile, etc). So that shouldn't be converted to nothing. Maybe a something ? :) So numeric vectors are actually complex sum types: NA | NaN | -Inf | Inf | Num

And in fact, there may be another monad hidden here: if vectors are anything like lists in Haskell (contain values of the same type), then that's a monad too. If that's the case, then wrapping a character vector in maybe would result in the following type signature: Maybe [Maybe chr]

Not sure this is useful at all though :) Just brain(storming|farting) here.

Do you think it would make sense to write a bind function that can handle vectors containing NA-s and NaN-s natively, as if it was a [Maybe a] (which it is)?

Kupac commented 2 years ago

For a single value, it would look like this:

`%>=%` <- function(x, f) {
    if (is.na(x) || is.nan(x)) {
        return(NA) 
    } else {

        func <- deparse(substitute(f)) 
        cmd <- paste0("x %>% ", func) 
        res <- eval(parse(text=cmd)) 

    }
    res
}
> "asd" %>=% paste("exists!")
[1] "asd exists!"

> NA %>=% paste("exists!")
[1] NA

Compare this last one to the most annoying things about the paste function:

> NA %>% paste("exists!")
[1] "NA exists!"
Kupac commented 2 years ago

OK, here's the "vectorised" version (untested):

# List_of_maybes bind
`%L>=%` <- function(x, f) {
    nans  <- which(is.nan(x))
    justs <- which(!(is.na(x) | is.nan(x)))

    func <- deparse(substitute(f)) 
    cmd <- paste0("x[justs] %>% ", func) 
    res1 <- eval(parse(text=cmd)) 

    res <- rep(NA, length(x))
    res[nans] <- NaN
    res[justs] <- res1

    return(res)
}

For chr vectors it works:

> inp <- c("platypus", "walking palm", NA)
> inp %>=% paste("exists!")
[1] "platypus exists!"     "walking palm exists!" "NA exists!"

The problem is that this pipe only checks the first value. The vectorised pipe is more holistic.

> inp %L>=% paste("exists!")
[1] "platypus exists!"     "walking palm exists!" NA    
Kupac commented 2 years ago

It may not look like it, but I do dislike these extra pipe symbols (especially that we have to surround them with % signs). Super annoying to type, and remember all the different pipes. I think this is how the default R pipe should work, then we wouldn't have to deal with NA-s in every function, hehe.

Kupac commented 2 years ago

Also, regarding @Kupac comment here https://twitter.com/kupac/status/1482988381185449985?s=20

you mean safely() could provide something like maybe()? Something like this wouldn’t work:

Yes, this is exactly what I meant. That safely is almost like maybe, but it's lacking the pipe (which you kindly implemented). But indeed, the and_then solution looks cool too. I'll have to check it out, don't know what it is.

armcn commented 2 years ago

@Kupac interesting points, going through them now. Is your main FP background Haskell? I only have a few days of Haskell experience and a few months of Elm, the rest R and Javascript. While I like the way maybes are implemented in Haskell and Elm (ex. being able to have Maybe (List Maybe Int)), I think for maybes to have broad adoption in the R community they would need to be simplified. So in my mind they wouldn't replace the value of NAs and Infs (like for statistical calculations), but they would replace assertions, errors, empty vectors, NULL returns, etc.

armcn commented 2 years ago

Also, regarding @Kupac comment here https://twitter.com/kupac/status/1482988381185449985?s=20 you mean safely() could provide something like maybe()? Something like this wouldn’t work:

Yes, this is exactly what I meant. That safely is almost like maybe, but it's lacking the pipe (which you kindly implemented). But indeed, the and_then solution looks cool too. I'll have to check it out, don't know what it is.

and_then is just an alias for bind copied from Elm. It seems to capture what bind does better to make it easier for people to learn. https://package.elm-lang.org/packages/elm/core/latest/Maybe

safe_mean <- function(a) {
  if (length(a) == 0L) nothing() else just(mean(a))
}

safe_sqrt <- function(a) {
  if (a == -1) nothing() else just(sqrt(a))
}

seq(1, 10) |> safe_mean() |> and_then(safe_sqrt) |> with_default(0)
#> [1] 2.345208
armcn commented 2 years ago

I really like your maybe function, it covers many possible use cases. I thought that empty vector an empty dataframe is a very common use case, and you already included it, along with a lot of other possibilities.

is_undefined <- \(a) is.null(a) || is.na(a) || is.nan(a) || is.infinite(a)

This made me think: atomic vectors are actually already Maybe types, right? I mean _NA_character is not a character string, but the equivalent of a Nothing value. The same goes for _NA_integer, and NaN as well. So numeric vectors have two ways to express nothing !

Inf is a bit trickier. It's true that it's not a number per se, but it should be considered in rankings and some functions (median, quartile, etc). So that shouldn't be converted to nothing. Maybe a something ? :) So numeric vectors are actually complex sum types: NA | NaN | -Inf | Inf | Num

And in fact, there may be another monad hidden here: if vectors are anything like lists in Haskell (contain values of the same type), then that's a monad too. If that's the case, then wrapping a character vector in maybe would result in the following type signature: Maybe [Maybe chr]

Not sure this is useful at all though :) Just brain(storming|farting) here.

Do you think it would make sense to write a bind function that can handle vectors containing NA-s and NaN-s natively, as if it was a [Maybe a] (which it is)?

I think a vector bind function could be useful but not sure if we should mix maybes and NAs in the same package. Goes to show there is lots of room for FP experimentation in the R space.

armcn commented 2 years ago

It may not look like it, but I do dislike these extra pipe symbols (especially that we have to surround them with % signs). Super annoying to type, and remember all the different pipes. I think this is how the default R pipe should work, then we wouldn't have to deal with NA-s in every function, hehe.

I like the elegance of the custom pipes but I think history shows that custom pipes in packages have a hard time catching on in the R world. It can get confusing for people without FP background

Kupac commented 2 years ago

Is your main FP background Haskell?

I consider R as funcional language, so I'd say this is my main FP background. But it's true that I used it in an imperative way before I learned some Haskell. Since then, there's no way back.

While I like the way maybes are implemented in Haskell and Elm (ex. being able to have Maybe (List Maybe Int)), I think for maybes to have broad adoption in the R community they would need to be simplified. So in my mind they wouldn't replace the value of NAs and Infs (like for statistical calculations), but they would replace assertions, errors, empty vectors, NULL returns, etc.

When we use R, we are always working in a multi-monadic context: IO, list and maybe are switched on by default, we just don't realise this, because we never leave this environment. It does make sense to wrap these in further monads, as you say (errors, missing-y values, etc).

It can be argued whether it makes sense to mix these two levels of Maybe-s together in a single package. But anyway, I understood that this is a generic discussion on FP in R, and I do think we can benefit from thinking of NA-s as maybe-s.

For example, we can create a bind function for them, which is useful, but they don't need mreturn and unwrap functions, because we stay in the low level maybe monad, so it's seamless.

armcn commented 2 years ago

@Kupac that makes sense. If you want to start experimenting with that in a repo I'd be interested to see what you come up with

armcn commented 2 years ago

In regards to other possible FP packages, a general utilities package as an extension to purrr would be useful. I think https://ramdajs.com is the best inspiration I've seen

b-rodrigues commented 2 years ago

I think at this point it would be important to define our objectives. If we want to extend what is already available in R (natively or via purrr) we need to list use cases (@armcn already provided some) and explain why the current tools are not able to solve the problem well. Then we can could more easily explain to potential users why it's useful to consider learning a new package which will introduce some new abstract ideas (i. e. the entry cost is high). Good news @armcn already has a lot of code available 😊 I'm planning to go through it over the weekend and start writing some examples of these use cases 100% with just R+purrr and then with what @armcn developed

b-rodrigues commented 2 years ago

I think that purrr’s developers made a great job of introducing complex fp ideas to the masses, by showcasing real use cases that can more easily be solved using purrr instead of base R (or a non-fp approach). If we want to go add more complexity to the pile, then it needs to be justified; this is why I’ll try to come up with some examples (basically extending what @Kupac did in his nice blog post which kickstarted all this discussion 😄 ).

I think one of the most powerful usecases would be within a shiny app, as handling missingness etc would avoid having the app crash. Monads could also be used to provide easy logging to developers. I think there’s a nice opportunity here.

armcn commented 2 years ago

I think one of the most powerful usecases would be within a shiny app, as handling missingness etc would avoid having the app crash. Monads could also be used to provide easy logging to developers. I think there’s a nice opportunity here.

I agree with this. Most of the R code I write is for production uses as opposed to analysis scripts so that's where I see the need for these more complex FP ideas. Shiny apps, plumber APIs, and deployed pipelines. I doubt people would have a need to maybes in an interactive context

armcn commented 2 years ago

@b-rodrigues agree with examples being the best way to see what works. Feel free to do a PR to the maybe README with comparisons of base, purrr, and maybe versions of solutions

armcn commented 2 years ago

I've been trying to implement a functor map in R and it is surprisingly difficult. If we pretend that atomic vectors are a generic container like a list, then this works. If you can figure out a simpler way that still passes these tests let me know.

https://github.com/armcn/fp/blob/main/R/vectors.R https://github.com/armcn/fp/blob/main/tests/testthat/test-map.R

map <- curry(\(.f, xs) {
  if (not_fp_functor(xs))
    stop("map only works on atomic vectors or bare lists")

  else if (rlang::is_empty(xs))
    xs

  else if (rlang::is_bare_list(xs))
    lapply(xs, .f)

  else
    map_atomic(.f, xs)
})

map_atomic <- \(.f, xs) {
  new_xs <-
    vctrs::vec_init(.f(xs[1L]), n = length(xs))

  for (i in seq_along(xs)) {
    new_x <-
      .f(xs[i])

    if (is.atomic(new_x))
      new_xs[i] <-
        new_x

    else
      stop("The mapper function must return atomic scalars")
  }

  new_xs
}

Tests:

test_that("map fails with a data frame or tibble", {
  for_all(
    a = any_tibble(),
    property = \(a) {
      map(identity, a) |> expect_error()
      map(identity, as.data.frame(a)) |> expect_error()
    }
  )
})

test_that("map fails if the functor is atomic and the mapper returns a list", {
  for_all(
    a = any_atomic(),
    property = \(a) map(\(b) list(b), a) |> expect_error()
  )
})

test_that("map doesn't fail with any vector", {
  for_all(
    a = any_vector(len = c(0L, 10L)),
    property = \(a) map(identity, a) |> expect_silent()
  )
})

test_that("map preserves identity", {
  for_all(
    a = any_vector(len = c(0L, 10L)),
    property = \(a) map(identity, a) |> expect_identical(a)
  )
})

test_that("map is composable", {
  for_all(
    a = any_vector(len = c(0L, 10L)),
    property = \(a)
      map(length, map(unique, a)) |>
        expect_identical(map(compose(length, unique), a))
  )
})

test_that("map applies a function to each element", {
  for_all(
    a = integer_(),
    property = \(a) map(\(b) b + 1L, a) |> expect_identical(a + 1L)
  )
})

test_that("map is a curried function", {
  for_all(
    a = any_vector(len = c(0L, 10L)),
    property = \(a)
      map(identity)(a) |> expect_identical(map(identity, a))
  )
})
b-rodrigues commented 2 years ago

Regarding my previous message, for use cases, I have been playing around with @armcn {maybe} package for a blog post, and wanted to provide a thorough comparison to purrr::safely(). But I must admit that the following construct is quite nice as well:

library(magrittr) # to load %$%
library(dplyr)

safely_group_by <- purrr::safely(group_by)
safely_select <- purrr::safely(select)
safely_summarise <- purrr::safely(summarise)

starwars %>%
  safely_group_by(species, sex) %$%  # notice the %$%
  safely_select(result, height, mass) %$%
  safely_summarise(result,
                   height = mean(height, na.rm = TRUE),
                   mass = mean(mass, na.rm = TRUE)) %$%
  result

if any function fails, the pipe returns NULL, if not it returns the result. Now I think that the ensure parameter that maybe() provides is really a very nice added value, because (as you show in the README of {maybe}) if you would filter on a level that does not exist, instead of getting an empty Just value you get Nothing. So this is the direction I will go. What do you think @armcn ? I want to be fair and certain that I don’t miss valuable functionalities of your package :)

The other thing that I was playing around with, was trying to get {maybe} and {loud} to play together. The idea would be to have (what I call) a loud value (which is a list of $result and $log) in which the $result value is a Just or Nothing. But as I’ve soon discovered, different monads don’t necessarily play well together:

library(loud)
l_group_by <- loudly(group_by)
l_select <- loudly(select)
l_summarise <- loudly(summarise)

just(starwars) %>%
  loud_value() %>=% # here, I have a `loud` object whose `$result` is `Just starwars`
  fmap(l_group_by, species, sex)  # now here, I have a `Just loud group_by(starwars, species, sex)`

I’ve tried several things and then thought about it and read some docs; it feels like the only solution would be to change how my %>=% operator works internally; in the case where the provided object is of type maybe then fmap should be used internally to evaluate the wrapped function. What do you think?

armcn commented 2 years ago

Going over the documentation for promises it is designed very well and probably the most popular monadic package in R https://rstudio.github.io/promises/articles/overview.html Based on the API I'll consider if maybe should try to match it more closely