BAAQMD / tbltools

Tools for working with tabular data (some specific to inventoryverse)
Other
0 stars 0 forks source link

R CMD CHECK has several issues #12

Open arilamstein opened 3 years ago

arilamstein commented 3 years ago

When I run R CMD CHECK on the head I get

1 error x | 5 warnings x | 4 notes x

It would be nice if we could fix some (if not all) of these issues.

arilamstein commented 3 years ago

This doesn't really due it justice though. There seem to be hundreds of errrors/warnings/notes that just get rolled up by the system to this summary. E.g.

N  checking R code for possible problems (7.9s)
   bind_inventories: no visible global function definition for ‘%>%’
   bind_inventories: no visible global function definition for ‘map’
   bind_inventories: no visible global function definition for ‘all_same’
   bind_inventories: no visible global function definition for ‘bind_rows’
   bind_inventories: no visible global function definition for ‘mutate’
   bind_inventories: no visible global function definition for ‘na_if’
   bind_inventories: no visible binding for global variable ‘.name’
   bind_inventories: no visible global function definition for ‘%not_in%’
   bind_inventories: no visible global function definition for ‘:=’
   bind_inventories: no visible global function definition for ‘coalesce’
   bind_inventories: no visible binding for global variable ‘.’
   bind_inventories: no visible global function definition for ‘mutate_at’
   bind_inventories: no visible global function definition for ‘vars’
   bind_inventories: no visible binding for global variable ‘fct_inorder’
   ensure_distinct: no visible global function definition for ‘filter’
   ensure_distinct: no visible binding for global variable ‘n’
   exact_join: no visible global function definition for ‘str_csv’
   exact_join: no visible global function definition for ‘anti_join’
   extract_lookup_table: no visible global function definition for ‘last’
   extract_lookup_table: no visible global function definition for
     ‘filter_at’
   extract_lookup_table: no visible global function definition for ‘vars’
   extract_lookup_table: no visible global function definition for
     ‘all_vars’
   extract_lookup_table: no visible binding for global variable ‘.’
   extract_lookup_table: no visible global function definition for
     ‘filter’
   extract_lookup_table: no visible binding for global variable ‘n’
   extract_tree : uid: no visible binding for global variable ‘md5’
   extract_tree : graft: no visible global function definition for
     ‘filter’
   extract_tree : graft: no visible global function definition for ‘%>%’
   extract_tree : graft: no visible global function definition for
     ‘data_frame’
   extract_tree : graft: no visible global function definition for
     ‘replace_na’
   extract_tree : graft: no visible global function definition for
     ‘mutate’
   extract_tree : graft: no visible binding for global variable ‘parent’
   extract_tree : graft: no visible binding for global variable ‘label’
   extract_tree : graft: no visible global function definition for
     ‘bind_rows’
   extract_tree: no visible global function definition for ‘%>%’
   extract_tree: no visible global function definition for ‘select_’
   extract_tree: no visible global function definition for ‘data_frame’
   extract_tree: no visible global function definition for ‘md5’
   extract_tree: no visible global function definition for ‘filter’
   extract_tree: no visible binding for global variable ‘label’
   extract_tree: no visible binding for global variable ‘.’
   extract_tree: no visible global function definition for ‘rename’
   extract_tree: no visible global function definition for ‘mutate_at’
   extract_tree: no visible global function definition for ‘vars’
   extract_tree: no visible binding for global variable ‘node’
   extract_tree: no visible binding for global variable ‘parent’
   extract_tree: no visible global function definition for ‘funs’
   filter_and_label: no visible global function definition for
     ‘select_vars’
   filter_and_label: no visible global function definition for
     ‘unpack_list’
   filter_and_label: no visible global function definition for ‘decode’
   filter_categories: no visible global function definition for ‘%not_in%’
   filter_categories: no visible global function definition for ‘first’
   filter_categories: no visible binding for global variable ‘cat_id’
   filter_categories: no visible global function definition for ‘filter’
   filter_categories: no visible global function definition for ‘mutate’
   filter_categories: no visible global function definition for ‘:=’
   filter_categories: no visible global function definition for ‘decode’
   filter_facilities: no visible global function definition for ‘%not_in%’
   filter_facilities: no visible global function definition for ‘first’
   filter_facilities: no visible binding for global variable ‘fac_id’
   filter_facilities: no visible global function definition for ‘filter’
   filter_facilities: no visible global function definition for ‘mutate’
   filter_facilities: no visible global function definition for ‘:=’
   filter_facilities: no visible global function definition for ‘decode’
   filter_pollutants: no visible global function definition for
     ‘unpack_list’
   filter_pollutants: no visible global function definition for ‘%not_in%’
   filter_pollutants: no visible global function definition for ‘filter’
   filter_pollutants: no visible binding for global variable ‘pol_id’
   filter_pollutants: no visible global function definition for ‘mutate’
   filter_pollutants: no visible global function definition for ‘:=’
   filter_pollutants: no visible global function definition for ‘decode’
   filter_pollutants: no visible binding for global variable ‘pol_abbr’
   filter_sources: no visible global function definition for ‘map’
   filter_sources: no visible global function definition for ‘inner_join’
   filter_sources: no visible global function definition for ‘bind_rows’
   filter_years: no visible global function definition for ‘filter’
   filter_years: no visible binding for global variable ‘year’
   import_hierarchy: no visible global function definition for ‘%>%’
   import_hierarchy: no visible global function definition for ‘select_if’
   pivot_table: no visible binding for global variable ‘ems_unit’
   pivot_table: no visible global function definition for ‘first’
   pivot_table: no visible global function definition for ‘every_nth’
   pivot_table : as_js_list: no visible global function definition for
     ‘%>%’
   pivot_table : as_js_list: no visible global function definition for
     ‘map_chr’
   pivot_table : as_js_list: no visible binding for global variable ‘.’
   pivot_table: no visible global function definition for ‘imap’
   pivot_table: no visible binding for global variable ‘rpivotTable’
   pivot_table: no visible global function definition for ‘saveWidget’
   print_tbl: no visible global function definition for ‘%>%’
   print_tbl: no visible binding for global variable ‘.’
   print_tbl: no visible global function definition for ‘int’
   print_tbl: no visible global function definition for ‘str_replace_all’
   print_tbl: no visible global function definition for ‘humanize’
   print_tbl: no visible global function definition for ‘replace_which’
   print_tbl: no visible global function definition for ‘select_’
   print_tbl: no visible global function definition for ‘total_each’
   print_tbl: no visible global function definition for ‘bind_rows’
   print_tbl: no visible global function definition for ‘mutate_at’
   print_tbl: no visible global function definition for ‘vars’
   print_tbl: no visible global function definition for ‘one_of’
   print_tbl: no visible global function definition for ‘funs’
   print_tbl: no visible binding for global variable ‘format_nonbreaking’
   print_tbl: no visible global function definition for ‘kable’
   read_csv: no visible global function definition for ‘read.csv’
   read_xls: no visible global function definition for ‘%>%’
   read_xls: no visible global function definition for ‘str_detect’
   read_xls: no visible global function definition for ‘regex’
   read_xls: no visible global function definition for ‘md5’
   rolling_join: no visible global function definition for ‘as.tbl’
   scale_at: no visible global function definition for ‘enquo’
   scale_at: no visible global function definition for ‘mutate’
   scale_at: no visible global function definition for ‘:=’
   scale_at: no visible global function definition for ‘mutate_at’
   select_first: no visible global function definition for ‘all_of’
   select_first: no visible global function definition for ‘everything’
   select_last: no visible global function definition for ‘all_of’
   select_last: no visible global function definition for ‘everything’
   show_duplicates: no visible global function definition for ‘group_by’
   show_duplicates: no visible global function definition for ‘filter’
   show_duplicates: no visible global function definition for ‘tally’
   show_duplicates: no visible binding for global variable ‘n’
   show_duplicates: no visible global function definition for ‘semi_join’
   sort_by_: no visible binding for global variable ‘median’
   sort_by_: no visible global function definition for ‘group_by_’
   sort_by_: no visible global function definition for ‘mutate’
   sort_by_: no visible global function definition for ‘desc’
   sort_by_: no visible binding for global variable ‘.wt’
   update.data.frame: no visible global function definition for ‘reduce’
   update.data.frame: no visible global function definition for
     ‘expr_text’
   update.data.frame: no visible global function definition for ‘mutate’
   update.data.frame: no visible global function definition for
     ‘bind_rows’
   validate_hierarchy: no visible global function definition for ‘str_csv’
   view: no visible global function definition for ‘View’
   wildcard_join: no visible global function definition for ‘%>%’
   wildcard_join: no visible global function definition for
     ‘str_replace_all’
   wildcard_join: no visible global function definition for ‘fixed’
   wildcard_join: no visible global function definition for ‘bind_cols’
   wildcard_join: no visible global function definition for ‘one_of’
   with_hierarchy: no visible global function definition for ‘str_replace’
   with_hierarchy: no visible global function definition for ‘num_range’
   with_hierarchy: no visible global function definition for ‘%>%’
   with_hierarchy: no visible global function definition for ‘mutate_at’
   with_hierarchy: no visible global function definition for ‘vars’
   with_hierarchy: no visible global function definition for ‘arrange_at’
   with_hierarchy: no visible global function definition for ‘md5’
   with_hierarchy: no visible global function definition for ‘drop_vars’
   with_hierarchy: no visible global function definition for ‘anti_join’
   with_hierarchy: no visible global function definition for ‘na.omit’
   with_hierarchy: no visible global function definition for
     ‘pack_integers’
   with_hierarchy: no visible global function definition for ‘left_join’
   with_hierarchy: no visible global function definition for ‘inner_join’
   with_hierarchy: no visible binding for global variable ‘.’
   write_csv : format_chr: no visible global function definition for
     ‘str_printable’
   write_csv : format_dbl: no visible global function definition for
     ‘str_trim’
   write_csv : format_dbl: no visible global function definition for
     ‘if_else’
   write_csv : format_dbl: no visible global function definition for
     ‘str_detect’
   write_csv: no visible global function definition for ‘%>%’
   write_csv: no visible global function definition for ‘mutate_if’
   write_csv: no visible binding for global variable ‘is_bare_double’
   write_csv: no visible global function definition for ‘write.csv’
   write_csv: no visible global function definition for ‘timestamp’
   write_csv: multiple local function definitions for ‘format_dbl’ with
     different formal arguments
   Undefined global functions or variables:
     %>% %not_in% . .name .wt := View all_of all_same all_vars anti_join
     arrange_at as.tbl bind_cols bind_rows cat_id coalesce data_frame
     decode desc drop_vars ems_unit enquo every_nth everything expr_text
     fac_id fct_inorder filter filter_at first fixed format_nonbreaking
     funs group_by group_by_ humanize if_else imap inner_join int
     is_bare_double kable label last left_join map map_chr md5 median
     mutate mutate_at mutate_if n na.omit na_if node num_range one_of
     pack_integers parent pol_abbr pol_id read.csv reduce regex rename
     replace_na replace_which rpivotTable saveWidget select_ select_if
     select_vars semi_join str_csv str_detect str_printable str_replace
     str_replace_all str_trim tally timestamp total_each unpack_list vars
     write.csv year
   Consider adding
     importFrom("stats", "filter", "median", "na.omit")
     importFrom("utils", "View", "read.csv", "timestamp", "write.csv")
   to your NAMESPACE file.

The particular issue "no visible global function definition" is somewhat controversial as to whether it is a "real" issue. But a main reason to fix it is that:

  1. A key issue of R packages is that we need to minimize any clash between any version of R, any other installed packages a user might have, and our package.
  2. Having a cluttered output of R CMD CHECK makes it harder to spot any "real" or "legitimate" issues.

Because of this, I recommend we make a good faith effort to try and get as close to 0 as possible for this and all other packages.

dholstius commented 3 years ago

I find the "no visible global function definition" to be unhelpful >99% of the time.

I feel like one time it helped me identify a bug. But I can't recall the details.

Here's the approach I'm thinking:

  1. Flood the zone by liberally putting packages in Depends: then
  2. Gradually move those packages to Imports:

... in effect trading one class of complaints for another (the latter being too-liberal use of Depends:.

dholstius commented 3 years ago

I'm also in favor of:

  1. Muting less-helpful complaints from check() by adding "stub" docstrings like the following:
#' @param verbose logical
#' @param x FIXME

... quickly, so that more consequential output from check() is easier to focus on. As you say, cluttered output is less valuable!

Then maybe even creating a separate branch docstrings for work on those. Should be easy to merge. Thoughts @arilamstein ?

dholstius commented 3 years ago

The overall goal is really to migrate this from CRAPL to something more "production-ready". It's time!

arilamstein commented 3 years ago

I find the "no visible global function definition" to be unhelpful >99% of the time.

I feel like one time it helped me identify a bug. But I can't recall the details.

Here's the approach I'm thinking:

  1. Flood the zone by liberally putting packages in Depends: then
  2. Gradually move those packages to Imports:

... in effect trading one class of complaints for another (the latter being too-liberal use of Depends:.

So I've now spent some time on this and have different thoughts.

My original complaint about the "no visible global function" was (by memory) related to a very specific NOTE ("no visible global variable") that R CMD CHECK started throwing when using ggplot2 (link). If you search for "@hadley" on that page you'll see a cute message from Hadley when he changed his position on the NOTE.

These "no visible global function" NOTEs here I think are actually quite serious. They indicate that R basically does not know which function you are referring to. There is ambiguity. This means that the fact that the code works at all is, I believe, due in some sense to luck. More technically, the "luck" is that (a) there is only one function with that name loaded or (b) that multiple packages are loaded that define that function name, and due to the order in which they are loaded, R is finding the "right" function first. In this case, if someone else on another machine uses our code, they might get a different result.

In most cases if I do ??function-name then then there is only 1 hit, so I can solve the problem easily. In some cases, though, there are multiple packages with the same name loaded (e.g. dplyr and plyr). In those cases I think that it's safe to guess that you meant the dplyr version. In other cases, though (e.g. data_frame) you are using a function that I have never used myself, and which could refer to several different packages. We'll have to talk about those.

I've found a few cases where the function was using a package that was not listed in the DESCRIPTION file (rlang and kable), so I'm adding that in. And a few where the function was listed as "supersceded". (I don't think I found any cases where you were using a deprecated function). I mention these cases specifically because it makes the package vulnerable when we eventually update the version of R and the R Packages on the server. By being explicit about which packages and functions we need, and making sure we pass R CMD CHECK now, we can then rerun CHECK after the update as a kind of smoke test. If we just put everything in Depends, and ignore these NOTEs, we won't be able to do that. Here we'll know that the functions we depend on are still there, still take the same number of parameters, and so on.

This isn't particularly glamorous work, but I do think that it's important, and that it will make the code easier to share with others (we can be more confident that they will get the same results we do) and maintain (we will be less susceptible to code breakages when R and other packages we use update).

Update: actually, this package does use at least one deprecated function: select_vars from dplyr. Deprecated means that at some point it will be completely removed from the package. If we address these "no visible global function" NOTEs by specifying we mean the select_vars function from the dplyr package now, and run CHECK again after updating dplyr, we will be notified as soon as the function is actually removed..

arilamstein commented 3 years ago

I am going to take a different approach to this issue. I think that "passing R CMD CHECK" for this package is likely too big a project for a single PR. So I will create different issues for subsets of this issue that I think are more manageable milestones.