Closed imkeller closed 3 years ago
Hi,
You cannot do that with TCGAbiolinks, but you can do that with GenomicsDataCommons https://rpubs.com/tiagochst/GDC_read_length
Would that work?
Yes, perfect, this works. Thank you!
I just figured out that there is a problem with this way of accessing read length, because it only allows me to match the read length on patient level. However one patient may have multiple samples with multiple sequencing runs which differ in read length. I could not find any file name/ identifier on sequencing run level, that allows me to link the results of GDCquery() to the read length obtained from GenomicsDataCommons. Do you have an idea how to solve the problem?
@imkeller I made some changes to get the sample information instead of the patient: https://rpubs.com/tiagochst/Read_length_GDC
Could you give me more details about what files from GDCquery()
do you want match ?
OK, I managed to match the filename of the RNAseq counts by using the 'downstream_analyses.submitter_id' entry, thanks!
Dear developers, I would like to access the read length of RNA sequencing data using the TCGAbiolinks package. The information is found in the "Reads Group" section of data related to the bam file, for example here: https://portal.gdc.cancer.gov/files/9a27ecb6-4d6b-4b5f-ac24-7da3b67b55cd Is there a way to access this information? Many thanks, Katharina