Closed katehoffshutta closed 1 year ago
@katehoffshutta the suggested change changes the sample type from names (e.g., "Primary Tumor") to barcode numbers (e.g. "01"). It, hence, requires the user to know this mapping of numbers to sample types. I would suggest an internal mapping of these intuitive names to numbers. If not, the manual entry and usage in the intro vignette would need to be adapted to this encoding.
Good point - I did this to enable more user flexibility (e.g. to select primary tumors but not metastases) but it does need some documentation now. I will add a link to the GDC mapping in the man pages for these functions and then update this PR.
Do you think it would be more intuitive to use either of the other two columns (Definition or Short Letter Code)? If so, we could provide an internal mapping from those values to the integer codes, but my preference is to refer the user to the outside reference from GDC for consistency.
The currently used rds object should already reflect the names behind the official GDC table meaning that also 'Metastatic' can be used to access the corresponding metastases data. I do not have a preference, but think that a description that does not require a manual lookup would be more user-friendly. This could be either the written-out type, or any of the other two with a helper function printing out the mapping from short code to written-out type.
@FischerJoBio changes adding sample type mapping are now included. Look OK?
@FischerJoBio Can you please review the changes to the filtering by barcode functions and confirm they look okay to you? I have removed the rds object argument and the functions now act on barcodes only.