BAAQMD / tbltools

Tools for working with tabular data (some specific to inventoryverse)
Other
0 stars 0 forks source link

Fix all `no visible global function definition for` NOTEs in R CMD CHECK #21

Open arilamstein opened 3 years ago

arilamstein commented 3 years ago

We should aim to have all the functions in tbltools produce the same results on different machines, regardless of what packages they have installed or loaded.

Currently when you run R CMD CHECK you get a lot of NOTEs like this:

   extract_tree: no visible global function definition for ‘data_frame’

My understanding is that this message says "There is no global function for data_frame. So I will have to look". This is an issue because the same function can be defined by multiple packages. And if that is the case, I believe that R will simply choose the first one that it encounters. This means that the function can potentially give different results depending on which packages are loaded first.

This type of bug can be very hard to track down. Quite literally, the function might (harmlessly) be using a different version of the function that you intend right now. But in the next version of the package it could be modified, or removed (in which case the function uses a different package's version of the function).

I addresses some of these issues with my recent PR. But I will need to pair program with @dholstius in order to do the rest. (For example, some of the functions are multiply defined in data.table and dplyr. I've never used data.table myself, so I can't confidently say which version he intended to use in the code).

The solution to each of these NOTES is:

  1. decide which package's function you want to use
  2. Add the following roxygen2 annotation to the top of the function #' @importFrom <package> <function>

For reference, in the case of the data_frame example above, here is what I see when I type ??data_frame:

Screen Shot 2021-04-09 at 9 35 27 AM

It appears that data_frame is defined in at least 3 packages (tibble, vctrs and rcmdcheck). We should define which version we want R to use in exact_tree.

arilamstein commented 3 years ago

Here is the current list of all of these NOTEs:

extract_lookup_table: no visible global function definition for ‘last’
  extract_tree : graft: no visible global function definition for
    ‘data_frame’
  extract_tree: no visible global function definition for ‘data_frame’
  filter_categories: no visible global function definition for ‘first’
  filter_facilities: no visible global function definition for ‘first’
  pivot_table: no visible global function definition for ‘first’
  print_tbl: no visible global function definition for ‘humanize’
  print_tbl: no visible global function definition for ‘replace_which’
  print_tbl: no visible global function definition for ‘total_each’
  print_tbl: no visible global function definition for ‘one_of’
  read_xls: no visible global function definition for ‘regex’
  scale_at: no visible global function definition for ‘enquo’
  select_first: no visible global function definition for ‘all_of’
  select_first: no visible global function definition for ‘everything’
  select_last: no visible global function definition for ‘all_of’
  select_last: no visible global function definition for ‘everything’
  wildcard_join: no visible global function definition for ‘one_of’
  with_hierarchy: no visible global function definition for ‘num_range’
  with_hierarchy: no visible global function definition for ‘na.omit’
  write_csv: multiple local function definitions for ‘format_dbl’ with
    different formal arguments
  Undefined global functions or variables:
    . .name .wt all_of cat_id data_frame ems_unit enquo everything fac_id
    first format_nonbreaking humanize label last median na.omit node
    num_range one_of parent pol_abbr pol_id regex replace_which
    total_each year
  Consider adding
    importFrom("stats", "median", "na.omit")
  to your NAMESPACE file.
dholstius commented 3 years ago

Aha. data_frame() should be replaced by tibble() or as_tibble(). It used to be tibble::data_frame() but I think that's defunct.

arilamstein commented 3 years ago

Aha. data_frame() should be replaced by tibble() or as_tibble(). It used to be tibble::data_frame() but I think that's defunct.

Thanks for clarifying that. I'll make that change now.

I think that this is the sort of like a "hidden cost" of being on the cutting edge of the tidyverse's toolset. It sometimes takes them a few years to settle on an implementation.

arilamstein commented 3 years ago

@dholstius There are 3 ambiguous calls to first. Both the dplyr and data.table packages have a function called first.

My hunch is that you want dplyr::first. But I have never used the data.table myself, and in any case, I thought I should check with you before writing anything. Can you confirm which package's first you want to use?

Update: the same is true to last.

Update: there is something similar for na.omit(). It is called from function with_hierarchy. And there are two packages that have this function defined: data.table and stats. Can you please specify which version you want called?

Update: can you please confirm that in function read_xls you mean base::regex when you write regex? When I type ??regex I get dozens of hits.

arilamstein commented 3 years ago

@dholstius The calls to one_of are ambiguous. Both tidyselect and rex have a function called one_of.

From the context I am 99% sure that you want tidyselect::one_of, and am going to submit that change in a PR. (The calls are in tidyverse code. And in one of the cases you actualy wrote tidyselect::one_of).

But I wanted to draw your attention to it in case I am misreading your intent.

arilamstein commented 3 years ago

Update. Here is a more concise list of the NOTEs where it is ambiguous to me which package you intended to the function call to choose:

extract_lookup_table: no visible global function definition for ‘last’ filter_categories: no visible global function definition for ‘first’ filter_facilities: no visible global function definition for ‘first’ pivot_table: no visible global function definition for ‘first’ read_xls: no visible global function definition for ‘regex’ with_hierarchy: no visible global function definition for ‘na.omit’

These are the ones not addressed in PR #34

dholstius commented 3 years ago

Assuming that I mean "the tidyverse one" whenever there is a conflict is 99% safe. If it was actually binding to a different one I probably wouldn't be aware (which would be good to know of course).