BIMSBbioinfo / pigx_rnaseq

Bulk RNA-seq Data Processing, Quality Control, and Downstream Analysis Pipeline
GNU General Public License v3.0
20 stars 11 forks source link

allow spaces in sample names? #23

Closed jonathanronen closed 2 years ago

jonathanronen commented 6 years ago

These lines https://github.com/BIMSBbioinfo/pigx_rnaseq/blob/master/scripts/deseqReport.Rmd.in#L138-L139 remove spaces from sample names.

I assume this was done to trim leading/trailing whitespace, @borauyar ? If so, I suggest we trim using something like

gsub("^\\s+|\\s+$", "", x)

and not

gsub(' ', '', x)

as the former only removes whitespace from the head and tail of the string, and the latter (current implementation) removes spaces from inside the string too.

Or perhaps there is a different reason for this being there? Or perhaps we should leave it as is, which means spaces are not allowed in sample types and covariates?

borauyar commented 6 years ago

Yes, you are right. We should remove only leading/trailing whitespaces, but also we shouldn't allow spaces in the sample names. They are later turned into column names in R objects, which would be converted into dots. Sample names can be used as rownames of an object, while they can be used as column names of another object. Often, we want to match the rownames with column names. Then we should make sure that they are kept the same everywhere. Maybe we could convert the spaces in sample names into underscores and print a warning about this conversion.

borauyar commented 6 years ago

allow spaces in sample names

borauyar commented 3 years ago

https://github.com/BIMSBbioinfo/pigx_rnaseq/issues/56