Some problems about the parameter setting

Niinleslie commented 1 year ago

Hi Robert, thanks for providing this useful tool!. Here, we have some problems about the parameter setting. First, we tested three cell lines originating from T cell lymphoma (JURKAT, PEER and HPB-ALL) you used in the paper. The bam files of chr14 were downloaded from SRA (SRR8618983, SRR8619000, SRR8619077). When using the defalut setting of median.thresh (15), the warning occured as "Sample failed QC due to number of exons removed, try lowereing median.thresh if depth in sample is low." and the pocedure returned NA value. It only worked when I set median.thresh = 0. Here is my final command:

 cov.file <- getCovFromBam(bamPath=bam, outPath="./", vdj.seg=tcra_seg_hg19)
 cov_df <- loadCov(cov.file)
 TCRA.out <- runTcellExTRECT(cov_df, TCRA_exons_hg19, tcra_seg_hg19, 'hg19', sample_name = sampleID, GC_correct = T, median.thresh = 0)

And the results is a little different with yours, and could you please tell me the differences between our procedures? sample TCRA.tcell.fraction TCRA.tcell.fraction.lwr TCRA.tcell.fraction.upr qcFit 1: SRR8618983 0.9690306 0.9667572 0.9711485 1.401706 2: SRR8619000 0.8976449 0.8903701 0.9044369 1.220927 3: SRR8619077 0.9188503 0.9116217 0.9254876 2.731876

Besides, we also applied adjustTcellExTRECT function on the WES data of esophageal cancer patients with paired blood and normal tissues. The purity/ploidy of tumor samples and normal samples were estimated using Sequenza, with the paired blood sample as the control. And the purity and plodiy were set as 1 and 2 for blood sample. However, we found that the output is beyond our expectations, as several normal tissue has relative high TRAC fraction. I wonder know if you have any suggestions about the parameter setting to make the comparisons between paried blood, normal sample and tumor sample from the same patient more resonable. Besides, in our datasets, some patients only have paried normal and tumor samples, but without blood samples. In this case, we set the purity=1 and ploidy=2, dose it sound resonable to you?

Look forward to your favourable reply! Appreciate!

rbentham commented 1 year ago

Hi,

To run on T cell derived cell lines you should set median.thresh = 0, this is done as many of the exon regions within the deleted V(D)J region will have close to zero coverage. T cell ExTRECT is set up to include a QC steps that interprets a exon region with close to zero coverage as having failed (as does sometimes happen) and removes them from the calculation. If a large number of regions are being removed the function will assume the sample has quality issues and return an NA and give the warning message you saw. In the case of a T cell derived cell line such as JURKAT many exons with close to zero coverage is expected so for these samples you need to turn this QC step off by setting median.thresh = 0.

I am suspecting that the small differences in the T cell fraction between the paper version and your results may be due to how the bams were processed? Let me know if you want to compare our aligning scripts!

As a note adjustTcellExTRECT is only required to be run on tumour samples where the ploidy has changed from 2 in the region around the TCRA locus and not on blood or tissue normal samples. In our experience we have found that blood samples (derived from the white blood cells in the buffy coat) frequently have high T cell fraction. When we have looked at normal tissue (see Ext Fig 3e from the paper) usually we do not detect any T cell content, but there is a minority of cases where we do. I would presume these are from inflamed tissue. If you want to do additional QC on these high T cell cases I would look at the plots directly (with plotTcellExTRECT) and visually you should be able to see if the score is being biased in any way by only one or two exons. The qcFit value is my attempt to quantify this and will tell you how oscillating the fitted model is. If it is highly oscillating (high values) the score is more likely to reflect noise.

Niinleslie commented 1 year ago

@rbentham Thanks a lot for your detailed response. It would be great if you could provided the alignment script for us to check the difference.

Besides, I found that the normal samples with high T cell content have relative high qcFit values (>3). Accorrding to your suggestions, I then plotted the depth ratio plot of TCRA loci using plotTcellExTRECT. But I'm a little confused about how to inteprete the plots. In the initial attempt, I filtered several exons which deviated from the fit line to see the change of Tcell contect and qcFit. When I removed the exons which indicated by the red arrow, both T cell content and qcFit value decreased, the exons indicated by the green arrow were on the contrary. Here is an example plot (After GC correction). So could you give some guidance on how to deal with this situation? Raw: After filtering exons which indicated by the red arrow:

Additionally, I wonder if it is necessary to plot for each sample to see whether there are "outlier exons"? And how many exons are required for accurate calculation of TCRA fraction?

For your convenience to test, I attached the cov_df of the example of mentioned above. example_cov_df_hg19.zip

rbentham commented 1 year ago

Hi,

I would recommend using the same set of exons for all the samples in your data set, especially if they are all using the same capture kit and sequenced at the same time. I would only remove additional exons if you notice that there is a consistent bias in all the samples regardless of predicted T cell fraction.

In some data sets we have noticed that the TCRD-V exon is consistently low across all samples and should be removed, but I am not sure if this is the case in your data. The other two exons you have labeled I would not be confident in removing. The example you showed has a consistent downward trend within the TCRD-J region so I believe there is genuine T cell content that is being picked up by T cell ExTRECT, to me I would prefer the original unfiltered fit despite the high QCfit value though I understand that this may not fit what you expect from your data. I would recommend looking at your samples with very little predicted T cell content and seeing if there are any unexpected exon outliers there, and if they are consistent with the ones you are filtering.

In general the WES data which is an input in T cell ExTRECT can be noisy and as such there is some noise in the output. One good thing to check is whether there are no samples that you detect with 0 or close to 0 T cell content this could be a sign of some exon biases leading to inflation of the scores. Within a data set you would definitely want the same exon set to be used so you can properly compare all the scores.

Niinleslie commented 1 year ago

@rbentham Thanks for your suggestions. Actually, we did not detect any "outlier exon" which consistenly existed in all the samples in our dataset. When analyzed with all TCRA exons and only keep samples < 3, we found that most normal samples with high T cell fraction were fitered out. And we would take this solution for now.

McGranahanLab / TcellExTRECT

Some problems about the parameter setting #22