Swarbricklab-code / BrCa_cell_atlas

Data processing and analysis related code associated with the study "A single-cell and spatially resolved atlas of human breast cancers".
107 stars 47 forks source link

Generating pseudobulk #5

Closed inofechm closed 3 years ago

inofechm commented 3 years ago

Can you please direct me to the code used to generate pseudobulk rna-seq from the paper and how to run the pam50 subtyping on the pseudobulk? I see the bulk-rna seq pam50 code but want to apply the pseudobulk method for my own breast samples so that would be appreciated. Thank you

dlroden commented 3 years ago

Hi, thanks for your query. This code isn't in the repo. For the Pseudobulk, we just summed up all the reads for each gene across all cells. So, it's the basic rowSums() function in R that was applied to the count matrix of each individual tumor.

We have also found that using the raw R2 fastq files as input to a bulk RNAseq pipeline will give comparable results to the count summation method.

Hope this helps

yewero commented 2 years ago

@inofechm I find the related codes are in the ecotypes/generate_pseudobulk_mixture_file.snakemake.R file. There are two methods mentioned: sum and average. The first one could be what you need.