BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
286 stars 109 forks source link

GDCprepare_clinic bug #546

Open todaroroad opened 1 year ago

todaroroad commented 1 year ago

When you phase the clinical data by the "GDCprepare_clinic" function. There is a fatal in your code, you just chose the follow_up_v1.0 as the identifier to distinguish the survival version, but, in the raw xml, TGGA using a "sequence" but not the v1 v2..., for instance, , and the better solution is phasing by the more flexible Python.

tiagochst commented 1 year ago

@todaroroad

You can either access the GDC version which has the latest version of the follow-up

clinical_from_GDC <- GDCquery_clinic("TCGA-COAD")
clinical_from_GDC <- clinical_from_GDC[clinical_from_GDC$submitter_id == "TCGA-A6-2681",]
clinical_from_GDC$days_to_last_follow_up
Screen Shot 2022-10-04 at 10 36 00 AM

Or you can parse the follow-up to get all sequences.

query <- GDCquery(
    project = "TCGA-COAD",
    data.category = "Clinical",
    file.type = "xml",
    barcode = c("TCGA-A6-2681")
)
GDCdownload(query)
follow_up_from_xml <- GDCprepare_clinic(query,"follow_up")

The sequence could be inferred by the date.

Screen Shot 2022-10-04 at 10 36 40 AM

Your complaint is that when we parse patient we don't update the information with the follow-up information. Indeed, I did not add that code.

todaroroad commented 1 year ago

excellent work,thank for your answer.😄😄😄

todaroroad commented 1 year ago

I find another bug ,for example , I use the GDCquery_clinic function to search the TCGA-A6-2677 in the TCGA-COAD category , there was no days_to_death info but days_to_last_followup 541 was as the death time. Hopely fix it, thanks.