BgeeDB / BgeeDB_R

Source code of the R packge BgeeDB to use data from the Bgee database
https://bioconductor.org/packages/BgeeDB/
GNU General Public License v3.0
14 stars 6 forks source link

Sample annotation variable names #5

Closed dhimmel closed 7 years ago

dhimmel commented 7 years ago

Running the following:

bgee_human <- BgeeDB::Bgee$new(species='Homo_sapiens', release='13.2', dataType='rna_seq')
annotation <- BgeeDB::getAnnotation(bgee_human)
col.names(annotation$sample.annotation)

The following two variables exist: Min..read.length and Max..read.length. Assuming the double period is a typo. Also underscores tend to make more sense for spaces in variable names than periods. They offer better cross-language compatibility.

julien-roux commented 7 years ago

Hi Daniel You're right I guess, but this is a bit of a controversial topic. For example Google's R Style Guide (https://google.github.io/styleguide/Rguide.xml#identifiers) advise against the use of underscores (although they do not justify why, and I agree that the "words separated with dots" convention can be disturbing for python users).

And thanks for spotting the double dot, this is indeed a mistake that we will correct in the next version.

PS: just a side comment, the BgeeDB:: in front of functions calls should not be necessary

dhimmel commented 7 years ago

For example Google's R Style Guide advise against the use of underscores

Yeah, I agree there's some disagreement. But the google style guide is outdated, while the more recent tidyverse family of packages has switched to underscores almost exclusively.

I'm fine with leaving as periods, just wanted to mention that underscores will likely be more user friendly.

And thanks for spotting the double dot, this is indeed a mistake that we will correct in the next version.

:+1:

just a side comment, the BgeeDB:: in front of functions calls should not be necessary

Since I don't explicitly import packages (i.e. I don't do library(BgeeDB)), the double colon operator is necessary. I like specifying where the function came from to improve readability and prevent confusion. IMO, it's a disastrous feature of R that everything gets imported into a single namespace. Explicitly referencing the package can help avoid some bugs that inevitably will crop up due to this aspect of R.

julien-roux commented 7 years ago

Thanks for the feedback Daniel

I have fixed the double dots issue: https://github.com/BgeeDB/BgeeDB_R/commit/d56070884436f96fee7181fd1dd2d6362a934bf9

fbastian commented 7 years ago

What about using underscores @julien-roux, if it is the current state of the art?

dhimmel commented 7 years ago

What about using underscores @julien-roux, if it is the current state of the art?

As per https://github.com/BgeeDB/BgeeDB_R/commit/d56070884436f96fee7181fd1dd2d6362a934bf9#commitcomment-20196812, I think the best approach would be to first replace spaces with underscores (using gsub) and then run make.names. Then you would end up with Min._read_length, which I think is fine.

Since it's an upstream process that is choosing column names, I'm now more hesitant to recommending munging them with the double period to single period substitution.