Bioconductor / GenomicDataCommons

Provide R access to the NCI Genomic Data Commons portal.
http://bioconductor.github.io/GenomicDataCommons/
83 stars 23 forks source link

gdc_clinical unable to pull patient data from case_ids #87

Closed Deepam84 closed 2 years ago

Deepam84 commented 2 years ago

Hello,

I have been trying to access data from R studio using GenomicDataCommons. When I try to access the patient data using gdc clinical with the following code:

library("TCGAutils") library("GenomicDataCommons") library("dplyr") library(vctrs) library(readr) library(reshape) library(tibble) library(org.Hs.eg.db)

manifest_file <- read.table("gdc_manifest.2021-09-20.txt", header = TRUE)

Case_barcodes <- UUIDtoBarcode(manifest_file$id, from_type = "file_id") head(Case_barcodes) Case_IDs <- UUIDtoUUID(manifest_file$id, to_type = "case_id")

key_ids = merge(Case_barcodes, Case_IDs, by = "file_id") key_ids$patients = substr(key_ids$associated_entities.entity_submitter_id, 1, 12) key_ids = key_ids[which(!duplicated(key_ids$patients)),]

Clinical_data <- gdc_clinical(as.character(key_ids$cases.case_id))

I get the following error message:

Error: Can't combine e4fc0909-f284-4471-866d-d8967b6adcbc$year_of_diagnosis and 78870a27-cb4f-5bd3-ba90-06cebe098f32$year_of_diagnosis .

We have removed duplicate files and it still doesn't work. Furthermore, the command gdc_clinical was run several times on the same data and other data just using case IDs with no issues last week. Please advise.

Thanks,

DMR

gdc_manifest.2021-09-20 (2).txt

LiNk-NY commented 2 years ago

Hi DMR, @Deepam84

In order to better handle your request, please provide a minimally reproducible example https://stackoverflow.com/a/5963610 and do not provide files. Please include all the text needed to reproduce the error within the issue text. Use triple backticks to delimit the R code

# example code goes here

PS. You can also use the reprex::reprex package to generate the example code to paste here.

Best regards, Marcel

Deepam84 commented 2 years ago
library("TCGAutils")
library("GenomicDataCommons")
library("dplyr")
library("vctrs")
library("readr")
library("tibble")

test_IDS <- structure(1:11, .Label = c("745c699b-8ecf-4653-b41f-6620f11bcf39", 
"26293554-04d3-4fc9-b12e-6a199278ed11", "1ae87b38-ecbe-4e09-92d3-84a2c11f08fd", 
"a3f87411-0a7b-4415-ba17-8edcc2639f6c", "c35e4390-2809-4d88-86f3-208de05b338a", 
"6e77b994-947e-48b6-9db7-0be77be48faa", "27c2926f-2da2-4fe1-87b0-8d0b786a249c", 
"ff82548a-f1bb-4daa-a8f7-b3831b5843ad", "bbceaec2-114b-4de5-a694-7d96078f9612", 
"f20ebad7-4ae7-4537-9236-445f87d3a8be", "7e7e4059-d620-4c8f-93e5-617b971290dc"
), class = "factor")

get_clin_data <- gdc_clinical(as.character(test_IDS))

Error: Can't combine 745c699b-8ecf-4653-b41f-6620f11bcf39$year_of_diagnosis and f20ebad7-4ae7-4537-9236-445f87d3a8be$year_of_diagnosis .

When I remove `f20ebad7-4ae7-4537-9236-445f87d3a8be the code works fine, but 1. I don't want to lose samples and 2. I don't understand what is going on with certain codes.

LiNk-NY commented 2 years ago

Hi Deepa, @Deepam84 I've opened a PR here #88 to resolve this issue. Feel free to use the comb_res branch for the time-being. Best regards, Marcel