ShixiangWang / home

王诗翔(Shixiang Wang)的个人网站
https://shixiangwang.github.io/home/
9 stars 7 forks source link

UCSCXenaTools: Retrieve Gene Expression and Clinical Information from UCSC Xena for Survival Analysis - Shixiang Wang #51

Open ShixiangWang opened 4 years ago

ShixiangWang commented 4 years ago

https://shixiangwang.github.io/home/en/post/ucscxenatools-201908/

tyaoi commented 4 years ago

To Shixiang Wang

After running the command lines in "Merge expression data and survival status", I got the following error message. I, however , don't know how to solve this problem. I will appreciate any advise and suggestion.

tyaoi

x and y must share the same src, set copy = TRUE (may be slow). Run rlang::last_error() to see where the error occurred.

So, I ran the command:

rlang::last_error() <error/rlang_error> x and y must share the same src, set copy = TRUE (may be slow). Backtrace:

  1. tibble::tibble(sampleID = names(KRAS), KRAS_expression = as.numeric(KRAS))
  2. dplyr::left_join(., cli, by = "sampleID")
  3. dplyr::auto_copy(x, y, copy = copy)
  4. dplyr:::glubort(...) Run rlang::last_trace() to see the full context.

Moreover, I ran the command:

rlang::last_trace() <error/rlang_error> x and y must share the same src, set copy = TRUE (may be slow). Backtrace: █

  1. └─%>%(...)
  2. ├─base::withVisible(eval(quote(_fseq(_lhs)), env, env))
  3. └─base::eval(quote(_fseq(_lhs)), env, env)
  4. └─base::eval(quote(_fseq(_lhs)), env, env)
  5. └─_fseq(_lhs)
  6. └─magrittr::freduce(value, _function_list)
  7. └─function_list[i]
  8. ├─dplyr::left_join(., cli, by = "sampleID")
  9. └─dplyr:::left_join.data.frame(., cli, by = "sampleID")
  10. └─dplyr::auto_copy(x, y, copy = copy)
  11. └─dplyr:::glubort(...)
ShixiangWang commented 4 years ago

@tyaoi I will retry code in this post and check if I can reproduce this error.

ShixiangWang commented 4 years ago

@tyaoi I cannot reproduce your error. I can go through all code in this post and reproduce the result. Could you run the following code to check the data?

head(KRAS)
head(cli)
tyaoi commented 4 years ago

@ShixiangWang @tyaoi I cannot reproduce your error. I can go through all code in this post and reproduce the result. Could you run the following code to check the data?

head(KRAS)
head(cli)

Yes !!

>head(KRAS)
TCGA-69-7978-01 TCGA-62-8399-01 TCGA-78-7539-01 TCGA-50-5931-11 TCGA-73-4658-01 TCGA-44-6775-01 
          10.25           10.29           10.82           10.29           10.36           10.03 

> head(cli)
$LUAD_clinicalMatrix
# A tibble: 706 x 148
   sampleID ABSOLUTE_Ploidy ABSOLUTE_Purity AKT1  ALK_translocati… BRAF  CBL   CTNNB1 Canonical_mut_i… Cnncl_mt_n_KRAS…
   <chr>              <dbl>           <dbl> <chr> <chr>            <chr> <chr> <chr>  <chr>            <chr>           
 1 TCGA-05…           NA             NA     NA    NA               NA    NA    NA     NA               NA              
 2 TCGA-05…            3.77           0.46  none  NA               p.A7… none  none   Y                Y               
 3 TCGA-05…           NA             NA     NA    NA               NA    NA    NA     NA               NA              
 4 TCGA-05…           NA             NA     none  NA               p.L6… none  none   N                N               
 5 TCGA-05…            2.04           0.48  none  NA               none  none  p.F77… N                N               
 6 TCGA-05…            3.29           0.48  none  NA               p.G4… none  p.T41A N                Y               
 7 TCGA-05…            3.99           0.570 none  NA               none  none  none   Y                Y               
 8 TCGA-05…            3.24           0.61  none  NA               none  none  none   Y                Y               
 9 TCGA-05…            1.86           0.74  none  NA               none  none  none   N                N               
10 TCGA-05…           NA             NA     NA    NA               NA    NA    NA     NA               NA              
# … with 696 more rows, and 138 more variables: EGFR <chr>, ERBB2 <chr>, ERBB4 <chr>,
#   Estimated_allele_fraction_of_a_clonal_varnt_prsnt_t_1_cpy_pr_cll <dbl>, Expression_Subtype <chr>, HRAS <chr>,
#   KRAS <chr>, MAP2K1 <chr>, MET <chr>, NRAS <chr>, PIK3CA <chr>, PTPN11 <chr>, Pathology <chr>, Pathology_Updated <chr>,
#   RET_translocation <chr>, ROS1_translocation <chr>, STK11 <chr>, WGS_as_of_20120731_0_no_1_yes <dbl>,
#   `_INTEGRATION` <chr>, `_PANCAN_CNA_PANCAN_K8` <chr>, `_PANCAN_Cluster_Cluster_PANCAN` <chr>,
#   `_PANCAN_DNAMethyl_LUAD` <chr>, `_PANCAN_DNAMethyl_PANCAN` <chr>, `_PANCAN_RPPA_PANCAN_K8` <chr>,
#   `_PANCAN_UNC_RNAseq_PANCAN_K16` <chr>, `_PANCAN_miRNA_PANCAN` <chr>, `_PANCAN_mirna_LUAD` <chr>,
#   `_PANCAN_mutation_PANCAN` <chr>, `_PATIENT` <chr>, `_cohort` <chr>, `_primary_disease` <chr>, `_primary_site` <chr>,
#   additional_pharmaceutical_therapy <chr>, additional_radiation_therapy <chr>,
#   additional_surgery_locoregional_procedure <chr>, additional_surgery_metastatic_procedure <chr>,
#   age_at_initial_pathologic_diagnosis <dbl>, anatomic_neoplasm_subdivision <chr>,
#   anatomic_neoplasm_subdivision_other <chr>, bcr_followup_barcode <chr>, bcr_patient_barcode <chr>,
#   bcr_sample_barcode <chr>, days_to_additional_surgery_locoregional_procedure <dbl>,
#   days_to_additional_surgery_metastatic_procedure <dbl>, days_to_birth <dbl>, days_to_collection <dbl>,
#   days_to_death <dbl>, days_to_initial_pathologic_diagnosis <dbl>, days_to_last_followup <dbl>,
#   days_to_new_tumor_event_after_initial_treatment <dbl>, disease_code <chr>, dlco_predictive_percent <dbl>,
#   eastern_cancer_oncology_group <dbl>, egfr_mutation_performed <chr>, egfr_mutation_result <chr>,
#   eml4_alk_translocation_method <chr>, eml4_alk_translocation_performed <chr>,
#   followup_case_report_form_submission_reason <chr>, followup_treatment_success <chr>, form_completion_date <chr>,
#   gender <chr>, histological_type <chr>, history_of_neoadjuvant_treatment <chr>, icd_10 <chr>, icd_o_3_histology <chr>,
#   icd_o_3_site <chr>, informed_consent_verified <chr>, initial_weight <dbl>, intermediate_dimension <dbl>, is_ffpe <chr>,
#   karnofsky_performance_score <dbl>, kras_gene_analysis_performed <chr>, kras_mutation_found <chr>,
#   kras_mutation_result <chr>, location_in_lung_parenchyma <chr>, longest_dimension <dbl>, lost_follow_up <chr>,
#   new_neoplasm_event_type <chr>, new_tumor_event_after_initial_treatment <chr>, number_pack_years_smoked <dbl>,
#   oct_embedded <lgl>, other_dx <chr>, pathologic_M <chr>, pathologic_N <chr>, pathologic_T <chr>, pathologic_stage <chr>,
#   pathology_report_file_name <chr>, patient_id <chr>, performance_status_scale_timing <chr>,
#   person_neoplasm_cancer_status <chr>, post_bronchodilator_fev1_fvc_percent <dbl>,
#   post_bronchodilator_fev1_percent <dbl>, pre_bronchodilator_fev1_fvc_percent <dbl>,
#   pre_bronchodilator_fev1_percent <dbl>, primary_therapy_outcome_success <chr>, progression_determined_by <chr>,
#   project_code <chr>, pulmonary_function_test_performed <chr>, radiation_therapy <chr>, residual_tumor <chr>, …

$LUAD_survival.txt.gz
# A tibble: 641 x 11
   sample          `_PATIENT`      OS OS.time   DSS DSS.time   DFI DFI.time   PFI PFI.time Redaction
   <chr>           <chr>        <dbl>   <dbl> <dbl>    <dbl> <dbl>    <dbl> <dbl>    <dbl> <lgl>    
 1 TCGA-05-4244-01 TCGA-05-4244     0       0     0        0    NA       NA     0        0 NA       
 2 TCGA-05-4249-01 TCGA-05-4249     0    1523     0     1523    NA       NA     0     1523 NA       
 3 TCGA-05-4250-01 TCGA-05-4250     1     121    NA      121    NA       NA     0      121 NA       
 4 TCGA-05-4382-01 TCGA-05-4382     0     607     0      607     1      334     1      334 NA       
 5 TCGA-05-4384-01 TCGA-05-4384     0     426     0      426    NA       NA     1      183 NA       
 6 TCGA-05-4389-01 TCGA-05-4389     0    1369     0     1369    NA       NA     0     1369 NA       
 7 TCGA-05-4390-01 TCGA-05-4390     0    1126     0     1126    NA       NA     1      395 NA       
 8 TCGA-05-4395-01 TCGA-05-4395     1       0     0        0    NA       NA     0        0 NA       
 9 TCGA-05-4396-01 TCGA-05-4396     1     303    NA      303    NA       NA     0      303 NA       
10 TCGA-05-4397-01 TCGA-05-4397     1     731    NA      731    NA       NA     0      731 NA       
# … with 631 more rows
tyaoi commented 4 years ago

@ShixiangWang @tyaoi I cannot reproduce your error. I can go through all code in this post and reproduce the result. Could you run the following code to check the data?

head(KRAS)
head(cli)

I ran under the following environment:

R version 4.0.2 (2020-06-22) RStudio Version 1.3.1056 Ubuntu 20.04 LTS

ShixiangWang commented 4 years ago

@tyaoi

@ShixiangWang @tyaoi I cannot reproduce your error. I can go through all code in this post and reproduce the result. Could you run the following code to check the data?

head(KRAS)
head(cli)

I ran under the following environment:

R version 4.0.2 (2020-06-22) RStudio Version 1.3.1056 Ubuntu 20.04 LTS

It seem that you skip the opration cli = cli$LUAD_survival.txt.gz. Please do it before the merge step.

qins commented 2 years ago

@ShixiangWang

@tyaoi

@ShixiangWang @tyaoi I cannot reproduce your error. I can go through all code in this post and reproduce the result. Could you run the following code to check the data?

head(KRAS)
head(cli)

I ran under the following environment:

R version 4.0.2 (2020-06-22) RStudio Version 1.3.1056 Ubuntu 20.04 LTS

It seem that you skip the opration cli = cli$LUAD_survival.txt.gz. Please do it before the merge step.

Now, the survival data comes from cli = cli$LUAD_survival.txt

ShixiangWang commented 2 years ago

@qins Thanks for your notion 👍. Sometime, the downloaded data are unzipped, so the file names will have no .gz extension.

Nancy12355 commented 4 months ago

你好,我想请问一下当我运行 merged_data = merged_data %>% mutate(group = case_when( MIMAT0000770_expression > quantile(MIMAT0000770_expression, 0.5) ~ 'MIMAT0000770_High', MIMAT0000770_expression < quantile(MIMAT0000770_expression, 0.5) ~ 'MIMAT0000770_Low', TRUE ~ NAcharacter )) 这串代码时,产生以下报错:

Error in mutate(): ℹ In argument: group = case_when(...). Caused by error in case_when(): ! Failed to evaluate the left-hand side of formula 1. Caused by error in quantile.default(): ! missing values and NaN's not allowed if 'na.rm' is FALSE Run rlang::last_trace() to see where the error occurred.

请问怎么解决呢

ShixiangWang commented 4 months ago

quantile 里面设置 na.rm = TRUE 试试