McGranahanLab / TcellExTRECT

Other
45 stars 20 forks source link

Frequent zero-infiltration prediction by TcellExTRECT in NSCLC #29

Open bozbezbozzel opened 1 month ago

bozbezbozzel commented 1 month ago

Hi,

I have a dataset of NSCLC tumors that have both been whole-exome sequenced and RNA-sequenced. I have some more samples for which I only have DNA so I wanted to use ExTRECT to predict the T cell fraction for those.

I also use the Danaher dataset standard for RNA deconvolution. I calculate the vst values for my dataset using DESeq2 and sum the values for all genes that make up a celltype, then take the z-score to get a relative infiltration for each celltype.

I have plotted that score against the ExTRECT prediction here: extrect_danaher

As you can see, the correlation between ExTRECT and RNA are pretty minimal, not like in your paper. But what stands out to me is the number of zeroes in the ExTRECT prediction. I'm wondering if it means I did something wrong when I applied ExTRECT.

I followed the steps from your GitHub and the only things I had to change were related to the gratia code parts breaking. My samples were all sequenced with the Agilent V5 Exon kit so I just used the provided TCRA exon files for hg38.

I used ASCAT to calculate purity and ploidy and applied a correction using those values-- I took the purity prediction as-is and used the provided TCRA region location to find the ASCAT segment that covered that region and took the copy number at that segment.

I looked at the upper and lower bounds for the TCRA predictions and they generally have a pretty small interval. My sequencing depth and quality is mostly pretty good ranging from 100 to 200x mean coverage.

Here's a plot of the adjusted TCRA value against ASCAT purity predictions, it's pretty much all over the place: extrect_purity

Here's an example of plotTcellExTRECT-output for sample 110029, which you can see as the leftmost sample in the first plot (so the highest RNA CD8 prediction but 0 TCRA value). I'm not entirely sure how to read this plot but if I'm interpreting it correctly my coverage isn't bad at all. And there is indeed no dip at the focal region which is why the prediction is zero. 110029_extrectplot

I guess data is sometimes just what it is, but if there's anything that jumps out at you here I'd love to have your input on why I get so many TCRA=0 when I expected otherwise.

Thanks for your time.

rbentham commented 1 day ago

Hi,

Sorry for the very late response! I was on annual leave when this was opened and missed this completely.

Unfortunately I do not see anything obvious from the plot or the data you have showed. WES data can be very noisy which can make running T cell ExTRECT difficult. Often frequent zero-infiltration values or generally deflated scores is caused by coverage from outlier probes that have consistently much higher coverage than all their neighbouring probes (making any depletion signal less obvious). This is especially in issue if these probes are near the focal region. From the plot you showed however it is not obvious that this is the case.

I would look at a few other plots and see visually if there are any probes that are consistently higher that their neighbours (to do this rigorously you would have to calculate the median coverage value of every probe within your dataset). If you identify any probes as outliers you can remove them and try recalculating T cell ExTRECT.