OHDSI / CohortDiagnostics

An R package for performing various cohort diagnostics.
https://ohdsi.github.io/CohortDiagnostics
40 stars 45 forks source link

R check reveals unresolved code issues #1019

Open schuemie opened 1 year ago

schuemie commented 1 year ago

R check (in develop) currently shows these code issues. I would resolve all of them. Most are probably nonsense, but if you don't fix those you won't see the ones that aren't:

* checking R code for possible problems ... NOTE
appendNewRows: no visible binding for global variable ‘primaryKey’
appendNewRows: no visible binding for global variable ‘columnName’
checkAndFixDuplicateRows: no visible binding for global variable
  ‘primaryKey’
checkAndFixDuplicateRows: no visible binding for global variable
  ‘columnName’
checkFixColumnNames: no visible binding for global variable ‘optional’
checkFixColumnNames: no visible binding for global variable
  ‘columnName’
combineConceptSetsFromCohorts: no visible binding for global variable
  ‘cohortId’
combineConceptSetsFromCohorts: no visible binding for global variable
  ‘cohortName’
combineConceptSetsFromCohorts: no visible binding for global variable
runConceptSetDiagnostics: no visible binding for global variable
  ‘conceptId’
runConceptSetDiagnostics: no visible binding for global variable
  ‘sourceConceptId’
runConceptSetDiagnostics: no visible binding for global variable
  ‘conceptCount’
runConceptSetDiagnostics: no visible binding for global variable
  ‘conceptSubjects’
runConceptSetDiagnostics : getBreakdownIndexEvents: no visible binding
  for global variable ‘domain’
runConceptSetDiagnostics : getBreakdownIndexEvents: no visible binding
  for global variable ‘cohortId’
runConceptSetDiagnostics : getBreakdownIndexEvents: no visible binding
  for global variable ‘conceptSetId’
runConceptSetDiagnostics : getBreakdownIndexEvents: no visible binding
  for global variable ‘uniqueConceptSetId’
runConceptSetDiagnostics : getBreakdownIndexEvents: no visible binding
  for global variable ‘conceptCount’
runConceptSetDiagnostics: no visible binding for global variable
  ‘codesetId’
uploadResults : uploadTable: no visible binding for global variable
  ‘columnName’
uploadResults : uploadTable : uploadChunk: no visible binding for
  global variable ‘emptyIsNa’
uploadResults : uploadTable : uploadChunk: no visible binding for
  global variable ‘dataType’
uploadResults : uploadTable : uploadChunk: no visible binding for
  global variable ‘columnName’
writeToCsv.tbl_Andromeda: no visible binding for global variable
  ‘cohortId’
writeToCsv.tbl_Andromeda : processChunk: no visible binding for global
  variable ‘cohort_id’
Undefined global functions or variables:
  ageGroup averageValue calendarInterval checksum codesetId cohort_id
  cohortDefinitionId cohortEntries cohortId cohortName cohortSubjects
  columnName comparatorChecksum comparatorCohortId comparatorId
  conceptCount conceptId conceptSetExpression conceptSetId
  conceptSubjects covariateId dataType domain emptyIsNa endDay
  installed.packages isRequired isVocabularyTable optional p
  periodBegin periodEnd primaryKey referent_concept_id sd seriesType
  sourceConceptId standardDeviation startDay sumValue tableName
  targetChecksum targetCohortId timeId uniqueConceptSetId
Consider adding
  importFrom("stats", "sd")
  importFrom("utils", "installed.packages")
to your NAMESPACE file.
azimov commented 1 year ago

So this is a bit of an annoying one. The tidyselect package started throwing warnings for using .data$ in dplyr calls, removing these now creates this problem. It is possible to overcome this by adding a statement like cohortId <- NULL at the top of each function, so it looks like there is a local binding for the variable but, to me, this makes the code less readable.

schuemie commented 1 year ago

I think you should just use quotes? So

data %>%
  select(.data$field)

becomes

data %>%
  select("field")

Which shouldn't trigger R check warnings.

azimov commented 1 year ago

So this works for inside select statements, but not inside filter, mutate etc - there the field is interpreted as a string

schuemie commented 9 months ago

So for dplyr select and rename statements you should use quotes, and all other dplyr statements need .data$ to avoid R check notes.

There are also other notes and warnings in R check that can quite easily be resolved.