MoTrPAC / MotrpacBicQC

R package for the MoTrPAC community
https://motrpac.github.io/MotrpacBicQC/index.html
MIT License
8 stars 4 forks source link

dl_read_gcp downloads vialLabel as integer64 #227

Closed dhkatz12 closed 8 months ago

dhkatz12 commented 9 months ago

In editing Greg Smith's QC normalization code, I discovered a little wrinkle when I tried to upgrade his GCP read-ins to the latest functions: dl_read_gcp uses data.table::fread to import data. It turns out that if you use this to read in the EQC, or another file that has vialLabel as a column, data.table represents that vector as integer64 because the values are >2^31. I don't really understand why it does this instead of making it numeric or character, since R explicitly represents integers as 32-bit, but this led to problems because Windows and Linux versions of R apparently handle this vector differently. Specifically, Windows failed to execute Greg's code, but Linux succeeded. Greg got it to work because he was reading in GCP files manually, and must have gotten it to read the column in some other way. Hence why no one noticed, since so few of us run Windows, but it definitely led to problems. For now, I was able to address the problem by manually converting the column to "character" after it gets read in by dl_read_gcp.

Two solutions:

  1. The faster option add an argument to the data.table::fread in dl_read_gcp that sets intger64 = "numeric" or integer64 = "character". Personally I think vialLabel behaves better as a character, since it's an identifier, not a value, but integer64 = "character" behaves really oddly and makes missing values blank rather than NA.
  2. The better option (IMHO) is to swap over to tidyverse and make all the read ins use readr::read_delim, though this will could cause some code to need further updates.