at present the loadfiles generated by GDCtools do not give the case_id or submitter_id associated with each row in the file. granted, those can be inferred pretty easily for TCGA samples, but not everyone will know the right way to do so (think about newcomers who don't know TCGA history), nor is this kind of "guessing" a robust strategy because it could in principle be different for each new data program at the GDC (and we should assume it will be and guard against such in the code). Therefore we should include a way of instantly & unambiguously mapping each row in a loadfile back to the identifiers it came with from the GDC ... either the case_id proper (which is a UUID) or the submitter_id ... which is akin to the TCGA participant barcode ... or both?
at present the loadfiles generated by GDCtools do not give the
case_id
orsubmitter_id
associated with each row in the file. granted, those can be inferred pretty easily for TCGA samples, but not everyone will know the right way to do so (think about newcomers who don't know TCGA history), nor is this kind of "guessing" a robust strategy because it could in principle be different for each new data program at the GDC (and we should assume it will be and guard against such in the code). Therefore we should include a way of instantly & unambiguously mapping each row in a loadfile back to the identifiers it came with from the GDC ... either thecase_id
proper (which is a UUID) or thesubmitter_id
... which is akin to the TCGA participant barcode ... or both?