giannimonaco / ABIS

57 stars 16 forks source link

What exactly is the output? #6

Closed jordankrull closed 3 years ago

jordankrull commented 5 years ago

I can reason that the program is attempting to output proportion representation of each population, but its not exactly clearly stated what the output is supposed to be. For instance, I ran this on a number of tumor samples with TPM values and not only had some negative values but also no sample added up to more than 47% in sum. Is this package not reporting proportion representation in sample? I can confidently state that the samples I ran are 99% of hematopoietic origin.

Additionally, does this program compute the RLM normalization? Or is this something that needs to be computed externally?

Just trying to get a sense for what this data actually means. I appreciate any insight you have. Excellent paper!

Best, J

giannimonaco commented 5 years ago

Hello and thank you for the interest in the paper!

Let's start with the RLM acronym. RLM, in this case, stands for Robust Linear Modeling. It is just a derivate of the common linear modeling which is more robust to noise. RLM produce a score (the Beta coefficient of the model) for each immune cell type by using a signature matrix as the independent variables of the model. The signature matrix contains the median expression values of a selected number of genes for the cell types we want to deconvolute.

The idea of this work is to try to obtain absolute proportions of immune cell types from bulk expression data of PBMC samples. For absolute proportion I mean that if the proportion of monocytes in a PBMC sample is 30%, then the value obtained by ABIS should also be 30.

Moreover, the method used, RLM, does not apply any constraint to the score obtained, meaning that only in an ideal scenario you will not have negative values and the sum of the scores will be 100.
However, this will never happen, and if you are lucky you will have few negative values around zero and the sum of proportions around 80-120 (only in case you try to deconvolute PBMC samples anyway). The problem is that the signature matrix has been produced using data from a few healthy individuals. Hence, it is not robust to biological variability and different cellular states. So, you might ask why not using constraints then? The answer is that at least you can use negative values and the sum of proportions as a warning message that your dataset contains new cell types or immune cell types under a different biological state compared to the ones used to build the signature matrix.

In addition, especially because you are not using data from PBMCs, you might also not want to trust the proportions to be absolute. However, you could still trust the relative differences between the samples of your dataset. Regarding the negative values, if they are around zero, I would just set the values the negative values to zero. If you have large negative values, instead, it means that the signal of the immune cell type in the ABIS signature matrix interferes somehow with some other cell type in your sample. In other words, you could guess which cell types differ or interfere substantially in terms of expression patterns between healthy PBMCs and your sample.

In conclusion, I believe that there is no method out there that is very robust to different tissues and biological conditions. ABIS is not a perfect method either, but at least you could have an idea on why and where it fails.

Hence, what I would do is:

  1. change low negative values to zero and exclude from the analysis the cell types which produce very large negative values.
  2. Use the remaining scores to describe relative changes between your samples.

I hope this helps. Let me know if something was not clear or if you have any further comment.

Best,

Gianni

jordankrull commented 5 years ago

Thank you Gianni for the thorough review. I appreciate the honest feedback. Comparing to flow, the relative differences between samples make much more sense than the absolute. As you mentioned, the likely cause is a non-PBMC and malignant nature of the sample. Once again, I appreciate the feedback.

AleixAS-mdc commented 4 years ago

Dear Gianni, Is it possible that the Technology labels to select either "RNA-Seq" or "Microarray" from the interface to input the data are swapped? I am inputting TPM values from RNA-Seq data from Illumina HiSeq 4000 from isolated PBMCs. By selecting "RNA-Seq" as the technology the output makes complete no sense, with negative values and no sample summing up to 1. Conversely, by selecting "Microarray" as the thecnology, the output makes much more sense, with no negative values, all sample summing up to values in a range 95-130 and the estimations are similar to what you would expect.

giannimonaco commented 4 years ago

Hi! The tool should be working fine. Do you get the proportion for 17 cell types when using RNA-Seq?

You could also try removing from the signature matrix (the file "sigmatrixRNAseq.txt") the cell types which show disproportionate negative signal.

Let me know, Gianni

On Thu, 30 Jan 2020 at 15:49, AleixAS-mdc notifications@github.com wrote:

Dear Gianni, Is it possible that the Technology labels to select either "RNA-Seq" or "Microarray" from the interface to input the data are swapped? I am inputting TPM values from RNA-Seq data from Illumina HiSeq 4000 from isolated PBMCs. By selecting "RNA-Seq" as the technology the output makes complete no sense, with negative values and no sample summing up to 1. Conversely, by selecting "Microarray" as the thecnology, the output makes much more sense, with no negative values, all sample summing up to values in a range 95-130 and the estimations are similar to what you would expect.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/6?email_source=notifications&email_token=AC2UTEEMC5ALJXANHIZMHOLRALSHXA5CNFSM4IH7FIHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKLIAXA#issuecomment-580288604, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEBYKAGVIAWFJSBU2Y3RALSHXANCNFSM4IH7FIHA .

AleixAS-mdc commented 4 years ago

Hi Guanni,

Thanks for your quick reply.

Yes, I get the proportion of 17 cell types. In such case, how should I interpret an output like the following example?

Annotation 2020-01-31 134352

1) As you can see in the attached picture/example, there are estimations of some cell types with negative values for some samples. How do I interpret this? As a 0? so there is no proportion of this cell types in such samples?

2) None of the samples sum up to 100% so, what it means? that the remaining proportion of cells are other kind of cells? (these samples come from isolated PBMCs).

3) Is there some kind of transformation that needs to be included or any downstream-step missing in order to get cell composition estimations (which I would expect to sum up 100%)?

I want to test if there are significant differences in cell composition between case-control samples.

Thanks,

giannimonaco commented 4 years ago

Hi,

the results do not look too bad to me. From a quick look, it seems that the only cell types are whose proportions are not in range are NK and CD8 memory. Either your patients have a lack of cells with killing activity or the gene expression profile of your cell types is substantially different from the one in the signature matrix. Here are my answers:

  1. Not even a negative value is < -1. I think this is a good sign. Moreover, you have negative values mainly for the cell types which have a very low-frequency in generally. If you are not interested in these cell types (e.g. Plasmablasts, pDCs, mDCs, MAIT), just set the values to 0. Otherwise, for each cell type, you could also add the most negative value to all the other values so that your minimum value is 0. I am mostly worried about NK cells, which should be in the range of 5-15%.

  2. If a sample does not reach 100%, one explanation could be that there are some other cell types in your PBMCs samples which we did not profile, or there is just some technical variability between our processing and yours. In any case, it is a good sign that the sum of the values is lower than 100% and not the other way around.

  3. I would probably not transform the values but instead, compare each cell type among the samples individually. If you transform the values to have 100% in each column, you risk overestimating the frequency of some cell types.

Please note also that this analysis should be considered as exploratory and to generate a hypothesis.

Best, Gianni

On Fri, 31 Jan 2020 at 13:53, AleixAS-mdc notifications@github.com wrote:

Hi Guanni,

Thanks for your quick reply.

Yes, I get the proportion of 17 cell types. In such case, how should I interpret an output like the following example?

[image: Annotation 2020-01-31 134352] https://user-images.githubusercontent.com/60473577/73540278-c6f02500-442f-11ea-9b53-75377319b497.png

1.

As you can see in the attached picture/example, there are estimations of some cell types with negative values for some samples. How do I interpret this? As a 0? so there is no proportion of this cell types in such samples? 2.

None of the samples sum up to 100% so, what it means? that the remaining proportion of cells are other kind of cells? (these samples come from isolated PBMCs). 3.

Is there some kind of transformation that needs to be included or any downstream-step missing in order to get cell composition estimations (which I would expect to sum up 100%)?

I want to test if there are significant differences in cell composition between case-control samples.

Thanks,

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/6?email_source=notifications&email_token=AC2UTECPZUTMFEMXLPFN6VDRAQNONA5CNFSM4IH7FIHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKOR2QY#issuecomment-580721987, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEHNBZQSRGIJNZXT5ODRAQNONANCNFSM4IH7FIHA .

AleixAS-mdc commented 4 years ago

Hi Gianni,

Thanks for your comments.

See on the following figure the distributions of estimated cell composition (y-axis) from my samples by cell type (x-axis). Colors/groups represent different group-samples on different conditions and disease.

Annotation 2020-02-03 133738

See that, indeed, I have some values < -1, mainly for T CD8 Naive (-8 values). In addition, my samples come from white Caucasian children, whereas your signature matrix comes from adult individuals from Singapore (correct me if I am wrong). So I could expect some differences on gene expression profiles.

Is there something else relevant you could comment?

Aleix

giannimonaco commented 4 years ago

Hi Aleix,

I see your point. You are right that having -8 is more worrying. You are right that the samples were from adult individuals from Singapore, so there is for sure some biological variability here. The goal of ABIS is to actually give eventual warning by showing you the negative values. You can always push the values up so that the minimum value is 0, but at least with ABIS you know that there is some biological variability here that cannot be neglected.

I am sorry I can't help more at this time. Hopefully, relative differences among individuals are real.

Gianni

On Mon, 3 Feb 2020 at 13:41, AleixAS-mdc notifications@github.com wrote:

Hi Gianni,

Thanks for your comments.

See on the following figure the distributions of estimated cell composition (y-axis) from my samples by cell type (x-axis). Colors/groups represent different group-samples on different conditions and disease.

[image: Annotation 2020-02-03 133738] https://user-images.githubusercontent.com/60473577/73653842-84765480-468a-11ea-89f3-d042a05514a7.png

See that, indeed, I have some values < -1, mainly for T CD8 Naive (-8 values). In addition, my samples come from white Caucasian children, whereas your signature matrix comes from adult individuals from Singapore (correct me if I am wrong). So I could expect some differences on gene expression profiles.

Is there something else relevant you could comment?

Aleix

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/6?email_source=notifications&email_token=AC2UTECTRTOXCN2XAPEQEJLRBAGHFA5CNFSM4IH7FIHKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKTWKSY#issuecomment-581395787, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEDPT4EFZLQA6H2RE53RBAGHFANCNFSM4IH7FIHA .