AndersenLab / easyfulcrum

easyfulcrum
Other
1 stars 0 forks source link

assign proper class to genotype data and save as processed .rds in project repo #33

Closed tcrombie closed 3 years ago

tcrombie commented 4 years ago

When reading in genotyping data from googlesheets there are often variables that are mis-assigned. The default class for an empty column is logical. We should specify the class we want when reading in the data. Also, the processed genotyping data could be stored in the repo as a .rds to preserve it outside of google sheets.

tcrombie commented 3 years ago

From Matteo: When you say that “the processed genotyping data could be stored in the repo,” do you mean we should move away from google docs to just save a fixed RDS of what is in the google doc, such that the user would be importing this as a fist step rather than importing from the google doc? I am also confused because you are saying “processed” as I’m not sure what you consider “processed” genotyping data (#33). I will specify the data type upon import from google sheets [that’s the first part of the issue]

tcrombie commented 3 years ago

There are really two issues here: 1) Specify the proper data classes from the genotyping sheets and 2) exporting a version of the "final" genotyping sheet to the /data/processed/genotypes directory. By "final" I'm referring to the processed genotyping dataset for which the checkGenotypes function returns no flags, or more specifically all flags are FALSE.

1) For specifying the proper data classes we can use col_types = in googlesheets4. However, this behavior will force the user to use all the columns the Andersen Lab uses for genotyping in the exact order we use them. For this reason I think we should add a col_types parameter to readGenotypes for the user to add their own col_types with the default being Andersen Lab col_types.

2) As for exporting the final genotyping sheet to the repo maybe we make a new function to wrap rio::export that will name and export the genotyping data to the correct directory. Name might include a prefix with the date of the export? maybe