SPI-Birds / pipelines

Pipelines for generating a standard data format for bird data
2 stars 6 forks source link

Update old pipelines #163

Closed StefanVriend closed 2 years ago

StefanVriend commented 3 years ago

Update pipelines created in v1.0 of the standard format to v1.1.

Notes/decisions:

  1. When uncertainty is provided as a column:
    • Expected values are passed on to <x>_observed and uncertainty is translated into <x>_min and <x>_max.
StefanVriend commented 3 years ago

Hi, @LiamDBailey, @zuzulaz, @ChloeRN, would you please want to check whether above notes are in line with what we decided during today's dev meeting?

ChloeRN commented 3 years ago

Hi @StefanVriend. This seems to pretty much summarise what we talked about, so I think this is good. I would suggest to add a note at the end of Point 2 (of the uncertainty bullet) that indicates that the last statement (numbers without "+" in a dataset containing "+" are accurate/without error) should be verified with the data owner for each dataset, and possibly for years within the dataset.

StefanVriend commented 3 years ago

I just had an issue with selecting the directory when testing a format_XXX() function. In the related (and closed) issue #79, @LiamDBailey suggested to use rstudioapi::selectDirectory(), but as far as I have seen there is no pipeline that does so. Shall I implement this as well while I'm updating old pipelines?

ChloeRN commented 3 years ago

@StefanVriend was that a problem for ANY format_XXX() or a specific one? I do think we do actually have a "custom" choose.directory function that uses different approaches depending on the OS. Or do I remember that wrong @LiamDBailey ?

StefanVriend commented 3 years ago

Issue #79 indeed mentions our custom function choose_directory(). As this is used in most (or all) of the pipelines, the problem I had could happen with any pipeline. It's because the first if statement (exists('utils::choose.dir')) returns FALSE, and the second function (tcltk::tk_choose.dir()) works irregularly in my case. As Liam describes in #79, rstudioapi::selectDirectory() is OS-agnostic, so it might be a more robust way of choosing directories in our pipelines (and other scripts).

LiamDBailey commented 3 years ago

So we can't use rstudioapi::selectDirectory() because it will fail when running the tests/checks in the build window of RStudio. What is the particular issue you're encountering? My first troubleshoot is to minimise the RStudio window because the directory finder sometimes appears behind this window for some reason 😕

StefanVriend commented 3 years ago

Okay! So most of the cases it's indeed an issue of the window not appearing, and then your troubleshoot works, but in some cases the window does not appear at all. :/ I think this only happens for tcltk::tk_choose.dir(), The regular choose.dir() seems to work fine on my Windows laptop, but exists('utils::choose.dir') (in our custom function) returns FALSE.

exists('choose.dir') returns TRUE, so maybe we only have to change that for a fix? @ChloeRN could you verify that neither of the two logical statements works on Mac?

Our custom function:

choose_directory <- function() {
  if (exists('utils::choose.dir')) {
    utils::choose.dir()
  } else {
    tcltk::tk_choose.dir()
  }
}
LiamDBailey commented 3 years ago

@StefanVriend you're right, having exists(utils::choose.dir') is a bug and should just be exists('choose.dir')

StefanVriend commented 3 years ago

Alright. Fixed in the master branch ✔️

LiamDBailey commented 3 years ago

:bug: Good emoji use @StefanVriend

LiamDBailey commented 3 years ago

@StefanVriend I've merged the master into this so we have the new testing framework. When we update the pipelines to 1.1 we'll need to update the tests too because tests for Sex or LayDate will fail with new col names.