PBMC data does not add up to approximately 100 percent

SaTu84 commented 3 years ago

Dear author,

I recently tried your deconvolution method on PBMC RNAseq data (TPM). However, none of the columns (samples) in the result table did add up to 100%, not even coming close in most cases. In some cases the percentage even was significantly above 100% Below find a subset of the data as I got it (colSums(data)):

X7053.08.001.005 X7053.08.001.008 X7053.08.001.010 X7053.08.001.011 X7053.08.001.012 X7053.08.001.013 X7053.08.001.014 47.658550 73.662520 44.081200 58.190400 63.597400 68.691400 51.867800 X7053.08.001.015 X7053.08.001.016 X7053.08.001.017 X7053.08.001.018 X7053.08.001.019 X7053.08.001.020 X7053.08.001.021 59.207400 79.147700 70.589700 56.311700 58.408340 62.325400 37.441700 X7053.08.001.022 X7053.08.001.023 X7053.08.001.024 X7053.08.001.025 X7053.08.001.026 X7053.08.001.027 X7053.08.001.028 46.511880 58.139300 64.109700 61.042100 52.493300 80.300900 59.569000 X7053.08.001.029 X7053.08.001.030 X7053.08.001.031 X7053.08.001.006 X7053.08.001.007 X7053.08.001.032 X7053.08.001.033 84.131000 64.686200 127.991800 74.345300 88.099000 99.438800 58.528200 X7053.08.001.034 X7053.08.001.035 X7053.08.001.036 X7053.08.001.037 X7053.08.001.038 X7053.08.001.039 X7053.08.001.040 53.186800 37.064300 60.761600 63.212230 44.397900 67.132800 61.01230

I know from trying a different deconvolution tool (quanTIseq) that neutrophils highly bias the result as many genes specific for this immune cell are highly expressed (also using your deconvolution tool, ranging from 40 to 110%). I don't know if this could impact your deconvolution method in such a way that I obtain such results as above? Generally, do you have any idea what could cause such results, and moreover, how to potentially solve the issue?

Thank you!

giannimonaco commented 3 years ago

Hi, Thank you for trying out the tool. The idea behind this method is that it should extract absolute proportions of 17 immune cell types. When the results do not sum to 100%, it should be because of the presence of other cell types (for example endothelial cells or fibroblasts). Other deconvolution methods force the proportion to sum to 100%. However, in my opinion this does not consider the presence of unknown cell types and it might affect the results even more.

Hence, if you have cancer samples, it is totally normal to have proportion that sum up to 40%. If you have blood samples, it could be normal to have 110% in total (consider always ~10% of technical variability). However, it could be weird if you have a total of 110% for cancer samples..

I hope it helps, but let me know if you have a strange situation.

SaTu84 commented 3 years ago

Hi,

Thank you for the reply. I indeed noticed that the tool computes absolute proportions, which I think is an advantage over other tools for the reasons you mention. However, my data is PBMC (blood) data and there I do not really expect the presence of large amounts of other cell types, other then immune cells. So I do not expect the proportions to add up to only approximately 40%.

Therefore I was wondering whether the significant presence of a certain cell type (neutrophils in my case) could somehow influence the results in such a way that the absolute proportions of remaining immune cell types are not efficiently captured anymore? Thank you!

SaTu84 commented 3 years ago

Hi,

Thank you for the reply. I indeed noticed that the tool computes absolute proportions, which I think is an advantage over other tools for the reasons you mention. However, my data is PBMC (blood) data and there I do not really expect the presence of large amounts of other cell types, other then immune cells. So I do not expect the proportions to add up to only approximately 40%.

Therefore I was wondering whether the significant presence of a certain cell type (neutrophils in my case) could somehow influence the results in such a way that the absolute proportions of remaining immune cell types are not efficiently captured anymore? Thank you!

Sorry, I meant to say whole blood data, not PBMC.

giannimonaco commented 3 years ago

Hi, thank you for commenting on this. Yes, it is possible that the signal of other cell types is masked by the presence of Neutrophils, especially if the sequencing depth is not that high. This is also our concern when we needed to decide which samples to use to develop the deconvolution approach. Many cell types, like plasmablasts or dendritic cells are present in a very low proportion. Also, we got some neutrophils and basophils from PBMCs (the low-density ones), but we did not test if the expression of low density neutrophils is comparable to the neutrophils you have in whole blood. By the way, which is the proportions of neutrophils you get in your results?

On Mon, 3 May 2021 at 11:52, SaTu84 @.***> wrote:

Hi,

Thank you for the reply. I indeed noticed that the tool computes absolute proportions, which I think is an advantage over other tools for the reasons you mention. However, my data is PBMC (blood) data and there I do not really expect the presence of large amounts of other cell types, other then immune cells. So I do not expect the proportions to add up to only approximately 40%.

Therefore I was wondering whether the significant presence of a certain cell type (neutrophils in my case) could somehow influence the results in such a way that the absolute proportions of remaining immune cell types are not efficiently captured anymore? Thank you!

Sorry, I meant to say whole blood data, not PBMC.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/17#issuecomment-831152644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEGLUFZRF3OI77ADCWTTLZW6HANCNFSM43Z4YK7Q .

SaTu84 commented 3 years ago

Hi, Thanks again for you elaborate answer. Just to comment on the sequencing depth, we took this into account when designing the experiment to capture as many transcripts from as many cell types as possible. Hence, we sequenced with an average depth of 30 million reads. So my guess is this doesn't pose any constraints.

The proportion of Neutrophils on the data ranges from 0.0873 until 110, with a mean proportion of 40%.

Lastly, I don't know whether neutrophils from PBMC and whole blood are comparable but I did observe similar results trying different deconvolution methods (high amounts of neutrophils), so I'd expect there to be a significant overlap. One thing I will test is whether I can further divide the neutrophils in my dataset into LD and HD neutrophils based on signatures from literature.

giannimonaco commented 3 years ago

Hi, thank you also for sharing this. So, I guess that if you detect a high proportion of neutrophils it means that the signature of LD and HD neutrophils is similar.

Another thing that might be important to check is the way you processed the data. To reduce technical variability, it should be similar to how data were processed for the deconvolution method. We used kallisto for the alignment, and the TPM values for the signature matrix. Are you giving TPM values as input?

On Tue, 4 May 2021 at 08:14, SaTu84 @.***> wrote:

Hi, Thanks again for you elaborate answer. Just to comment on the sequencing depth, we took this into account when designing the experiment to capture as many transcripts from as many cell types as possible. Hence, we sequenced with an average depth of 30 million reads. So my guess is this doesn't pose any constraints.

The proportion of Neutrophils on the data ranges from 0.0873 until 110, with a mean proportion of 40%.

Lastly, I don't know whether neutrophils from PBMC and whole blood are comparable but I did observe similar results trying different deconvolution methods (high amounts of neutrophils), so I'd expect there to be a significant overlap. One thing I will test is whether I can further divide the neutrophils in my dataset into LD and HD neutrophils based on signatures from literature.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/17#issuecomment-831706321, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEGLF33NTZIRMEZ2RBDTL6GDJANCNFSM43Z4YK7Q .

SaTu84 commented 3 years ago

Hi, Thanks again for replying. Indeed, at least there is a substantial overlap in the HD and LD signatures. Concerning the processing: all was done in the same way to avoid as much as technical variability as possible. And we did input TPM values, coming from StringTie.

In any case, thanks a lot for all responses. It is really appreciated.

giannimonaco commented 3 years ago

No problems. Happy to discuss this.

On Wed, 5 May 2021 at 16:33, SaTu84 @.***> wrote:

Hi, Thanks again for replying. Indeed, at least there is a substantial overlap in the HD and LD signatures. Concerning the processing: all was done in the same way to avoid as much as technical variability as possible. And we did input TPM values, coming from StringTie.

In any case, thanks a lot for all responses. It is really appreciated.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/17#issuecomment-832738322, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEAUVU4FWOWID5LD2DLTMFJNPANCNFSM43Z4YK7Q .

giannimonaco / ABIS

PBMC data does not add up to approximately 100 percent #17