adokter / bioRad

R package for analysis and visualisation of biological signals in weather radar data
http://adokter.github.io/bioRad
Other
29 stars 16 forks source link

Stick to one naming convention #51

Closed peterdesmet closed 6 years ago

peterdesmet commented 6 years ago

While listing the functions in #50, I noticed there are a lot of name conventions used (lowerCamelCase, period.separated, underscore_separated) for those functions. Not sure if we want to tackle this now (but better now than later), but the it would be good to stick to 1. ๐Ÿ˜„ The link suggests underscore_separated.

adokter commented 6 years ago

@peterdesmet fully agree with that

peterdesmet commented 6 years ago

Any preference? Mine is underscore. I guess this will break scripts using bioRad functions (once people update the package)?

adokter commented 6 years ago

Yes I also prefer underscore, and reserve the dot for method dispatching only.

Most of the dots are indeed method functions, except for the is.XXX, read.XXX and as.data.frame.vpts functions (although the latter is actually a method of the vpts class). I did that because of convention in the R language (e.g. there are many as.data.frame class functions, so R hasn't really stuck to reserving the dot for only method dispatch itself).

Hesitant to change functions in the namespace indeed, as it will break things. Functions outside namespace can be changed.

We could do a push at some point to also change namespace functions, but I prefer to wait until there is a clearer list on what should be changed structure-wise

adokter commented 6 years ago

Check use of quantity and param.

quantity and param are both used as function arguments. param is also one of the object classes

param as a function argument is currently used to refer to radar observables only (dbz, vrad, etc), i.e. those that can be contained in the param class.

Quantity can refer to more profile quantities (like height of the altitude layer), and is thus more generic.

plot.ppi has param as function argument plot.vp has quantity as function argument which is not very intuitive

It seems more consistent to restrict param only to the class object, and have all function arguments be 'quantity'

adokter commented 6 years ago

Overview of R naming conventions https://journal.r-project.org/archive/2012-2/RJournal_2012-2_Baaaath.pdf

adokter commented 6 years ago

We can rename functions while keeping the old name working as well, as follows for a function oldName(arg1_old,arg2_old that we want to replace with newName(arg1_new,arg2_new) (using roxygen):

#' @name oldName-deprecated
#' @rdname bioRad-deprecated
#' @section \code{oldName}:
#' For \code{oldName}, use \code{\link{newName}}.
#' @export
oldName=function(arg1_old, arg2_old){
  .Deprecated("newName")
  newName(arg1_new=arg1_old, arg2_new=arg2_old)
}

This will generate a bioRad-deprecated help page listing all the deprecated functions, and give users a warning to switch to the new function name when they use the old name.

Calls to the old function will still work as well, so nothing gets broken (as long as newName function behaves exactly the same as oldName function, i.e. only naming is changed).

adokter commented 6 years ago
old name new name comments
as.data.frame.vpts br_as_data_frame also keep as.data.frame.vpts as alias
as.data.frame.vpts keep keep for consistency with many other as.data.frame.xxx R functions
basemap br_download_basemap
beamheight br_beam_height
beamwidth br_beam_width
bind br_bind
c.vp keep deprecate?
checkDocker br_check_docker
cmt br_cmt
composite br_composite
day br_day make this function take over function night as well?
dbz2eta br_convert_dbz_to_eta
dim.ppi keep
dim.scan keep
download_vp br_repo_download
elangle br_fetch_elevation
eta2dbz br_convert_eta_to_dbz
fetch br_fetch_quantity
getscan br_fetch_scan
h5ODIMobject br_fetch_ODIM_object_type
is.param keep
is.ppi keep
is.pvol keep
is.pvolfile is.pvol_file
is.scan keep
is.vp keep
is.vpfile keep
is.vplist keep deprecate
is.vpts keep
map br_map
mt br_mt also add function br_rt
mtr br_mtr also add function br_rtr
night keep deprecate
plot.ppi tbd should these be br_plot now, for consistency with br_map?
plot.vivp tbd should these be br_plot now, for consistency with br_map?
plot.vp tbd should these be br_plot now, for consistency with br_map?
plot.vpts tbd should these be br_plot now, for consistency with br_map?
ppi br_ppi
rcs br_rcs
rcs<- br_rcs<-
read.pvol br_read_pvol
readvp br_read_vp
readvp.list br_read_vp_list
readvp.table br_read_vp_text
regularize br_regularize_time_grid
retrieve_vp_paths br_repo_list_downloads
rsl2odim br_convert_RSL_to_ODIM
sd_vvp br_velocity_sd
sd_vvp<- br_velocity_sd
suntime br_sunrise split function in two
suntime br_sunset split function in two
updateDocker br_update_docker
vintegrate br_integrate_height to be added later: br_integrate_time
vol2bird br_calculate_vp ditch vol2bird name???
vpts keep deprecate
adokter commented 6 years ago

@peterdesmet @stijnvanhoey what do you think of the names above? I'm still a bit undecided on the "br_" prefix, but such a convention is handy with auto-completion in RStudio. Please improve where you can!

adokter commented 6 years ago

@plieper @CeciliaNilsson709 if you want to chime in on bioRad's function namings as well, please do

peterdesmet commented 6 years ago

Good that deprecated names can be listed. Will have a look at suggested names. If we go for a prefix, we might as well go for the more recognizable and not that much longer biorad_, no? But not sure a prefix is necesssary, @stijnvanhoey?

adokter commented 6 years ago

We also have to decide on a coherent naming convention of the function arguments, which may in fact be a bigger task than the naming of the functions

peterdesmet commented 6 years ago

@adokter, I finished my review of the function names and ran them by @stijnvanhoey. See this spreadsheet for the proposal.

General guidelines in naming functions:

  1. Stick to S3 convention and have polymorphistic functions (like plot) that work for multiple objects
  2. Don't use br_ or biorad_ prefix
  3. Where logical, name functions so these can be grouped: docker_check & docker_update (not check_docker & update_docker). Helps in alphabetical listing or tab completion
  4. Convert functions use _to_: dbz_to_eta (not dbz2eta), pvolfile_to_vp (not vol2bird)
  5. Objects can be created with a function bearing their name:
    • ppi(scan) => ppi
    • vpts(c(vp, vp)) => vpts
    • scan(pvol, angle = 2) => scan (not getscan)
    • vpi(vpts) => vpi (not integrate_heights)
    • vp(vpfile) => vp (not read.vp). Also supports reading a vector of vp files
  6. Keep what I assume are known abbreviations, like mtr, rtr, rcs, rvsd (not sd_vvp)
  7. Functions specific to a single object (and no planned support for other objects) start with that object name: ppi_composite, vpfile_download, vpts_from_text (not readvp.table), vpts_regularize. There aren't that many, but helps in alphabetical listing or tab completion

Let me know what you think. If easier, we could arrange another call too.

adokter commented 6 years ago

@peterdesmet thanks looks good, here some thoughts (I can't edit the excel file, so all comments here, following the order of the excel)

  1. bind(): is the more general function that has superseded vpts(). I'm ok with your suggestion of renaming bind() to vpts(), and deprecating bind().
  2. pvolfile listed as an object, but pvolfile is only a string with the path to a file. So dim.pvolfile and is.pvolfile not needed
  3. is.quantity, what is now is.param; ok to hide from user for now, but prefer to keep under the hood. I expect we later want functionality to merge separate param into scan, as certain countries provide separate files for separate param (sigh...). This means we want also a param(pvol, angle = 2, quantity = "DBZH") and param(scan, quantity = "DBZH")function. I'm thinking to keep the name param (or scanparam) reserved for objects storing a directly observed quantity by the radar, and quantity as the more generic name for function arguments.
  4. is.rslfile: Prefer a is.pvolfile that works for both european and US data, so have the user not worry about formats. RSL reads already many radar formats, see https://trmm-fc.gsfc.nasa.gov/trmm_gv/software/rsl/, so not really itself a radar data type
  5. mt, mt_cumul, mtr, rtr: these are now columns in the vpi object (which itself is a specially classed data.frame). Because all these values are conditional on the definition of the transect orientation (parameter alpha in vintegrate/vpi), alpha should be stored as an attribute to vpi
  6. deprecate cmt/mt_cumul; the vpi dataframe already will have a column mt which gives the (cumulative) migration traffic.
  7. night(lon, lat, date): currently does something different than day(object): day tests on bioRad objects, while night(lon, lat, date) checks whether it is night on a given location and date. Name is confusing, we could call it night_at_location instead?
  8. pvolfile_to_vp()/pvol_to_vp(): for technical reasons we can currently only go from a pvolfile to a vp, in the future likely from a pvol to a vp as well. I prefer pvol_to_vp() as it is shorter (pvolfile hard to pronounce), and we could have this function accept both a pvolfile and (in the future) a pvol. Currently the function can also output a vpfile, so the _to_vp is already ambiguous, so a little more pvol/pvolfile ambiguity doesn't matter :)
  9. pvolfile_type/h5ODIMobject: this function is not restricted to pvols. It checks the ODIM data type, so it can return one of "VP", "SCAN", or "PVOL". I would suggest renaming this to ODIM_object_type instead.
  10. suntime/sunrise/sunset not needed to have this work on a vpts, use day() for that.
  11. c.VP should be deprecated, there are some nasty side-effects of overloading the c operator. Instead use bind.VP/vpts.VP
peterdesmet commented 6 years ago

Odd, you should have access with your gmail account. But in response:

  1. OK, marked as deprecate: don't add
  2. Correct, updated quantity back to param were appropriate. Marked as hide
  3. Indeed, function added to spreadsheet (twice, for pvol and scan object). I'd argue to use the function parameter quantity = ... everywhere, not param
  4. Ok, all reference to rslfile object removed
  5. Ok, do I need to add/update something for that?
  6. Ok, marked as deprecate. I noticed mt is proposed for deprecation as well (#76) so marked as such
  7. Ooooooh! ๐Ÿ˜ฎBoth do return TRUE/FALSE. In that case... and assuming people are generally more interested in nights (radars being good for detecting nocturnal migration), why don't we call day() -> night() and have that function work on bioRad objects AND locations? Pinging @stijnvanhoey for advice
  8. Great! I prefer pvol_to_vp as well! Updated. Currently that function does (too) much... it would be nicer there was just pvol() to read, pvol_to_vp() to transform, and vpfile() to write it back to a file, but I understand why it was designed that way.
adokter commented 6 years ago

Just fyi: I have been editing until 4PM ET / 10PM EU time in the comment, see you replied already 15 mins earlier :)

peterdesmet commented 6 years ago

Haha! Ok, to continue the list:

  1. Is that function often used? Would it make sense to have object_type/biorad_object_type, that returns vp, scan, pvol?
  2. OK
  3. OK, marked for deprecation.

I haven't checked where in bioRad lists are used and where vectors. E.g. for vpfile_select it makes sense to get a vector back c(vpfile, vpfile, vpfile) that can be passed on to vp() to get c(vp, vp, vp) that can be passed to vpts()

stijnvanhoey commented 6 years ago
  1. +1 on the night() function being able to do either biorad object or custom location; deprecate day()
adokter commented 6 years ago
  1. not used a lot. Has come in handy so people can check what kind of a file they got from meteorologists. More useful when we can combine scans into pvols.
peterdesmet commented 6 years ago
  1. Ok, if you think it will be useful later, we can keep (lowercase) odim_object_type around.

For the suggested functions object_type (returning pvol, vp, ...), pvolfile_type (returning NEXRAD, ODIM) and more functionality for pvol_to_vp, I'll let you create issues if need be.

Let me know if you consider the renaming approved.

adokter commented 6 years ago

Still to do: renaming all the function arguments; we at least need to settle on some convention.

Do you and Stijn have time to spend on the refactor as well? I would like to deprecate without breaking things, see this earlier comment, so it will be quite a bit of work ...

adokter commented 6 years ago

Issue #20 needs to be resolved as part of this issue as well

adokter commented 6 years ago

naming of sd_vvp: there is the sd_vvp profile quantity, and there is the threshold that is applied to this quantity. sd_vvp() sets/retrieves this threshold to which sd_vvp quantity is compared

In the ODIM hdf5 profile file (see https://github.com/adokter/vol2bird/wiki/ODIM-bird-profile-format-specification) we use sd_vvp for the quantity and sd_vvp_thresh for the threshold.

I'm wondering whether for consistency it would be better to keep sd_vvp() or sd_vvp_thresh() instead of introducing the new rvsd(). Of course I can change sd_vvp to rsvd in the vol2bird code, but then we have different versions floating around, which I think is not worth it.

peterdesmet commented 6 years ago

@adokter

peterdesmet commented 6 years ago

Closing this discussion issue: todo list is part of #84.