giannimonaco / ABIS

57 stars 16 forks source link

How to combine similar cell types #18

Open sam-israel opened 3 years ago

sam-israel commented 3 years ago

For the sake of using the proportions generated by ABIS in cell type aware differential expression (TOAST package), I was recommended to merge similar cell types into 5-8 types only.

  1. Is there a more smart way of doing that rather than simply summing together ABIS output proportions ?
  2. The summary of ABIS generated proportions is
    Monocytes_C           NK           T_CD8_Memory   
    Min.   : 2.949   Min.   :-3.9268   Min.   : 1.674  
    1st Qu.: 6.622   1st Qu.: 0.9884   1st Qu.: 5.747  
    Median : 7.677   Median : 2.2476   Median : 8.797  
    Mean   : 8.137   Mean   : 2.4485   Mean   :10.647  
    3rd Qu.: 9.610   3rd Qu.: 3.7038   3rd Qu.:13.069  
    Max.   :15.787   Max.   : 9.7480   Max.   :48.491  
    T_CD4_Naive      T_CD8_Naive      
    Min.   :-2.411   Min.   :-25.8218  
    1st Qu.: 4.159   1st Qu.: -3.7227  
    Median : 6.610   Median : -0.4212  
    Mean   : 7.563   Mean   : -1.3039  
    3rd Qu.: 9.806   3rd Qu.:  2.6580  
    Max.   :25.564   Max.   :  9.5094  
    B_Naive         T_CD4_Memory         MAIT       
    Min.   : 0.5531   Min.   :-4.018   Min.   :-0.359  
    1st Qu.: 3.1769   1st Qu.: 1.671   1st Qu.: 2.001  
    Median : 4.6171   Median : 4.162   Median : 3.772  
    Mean   : 5.3058   Mean   : 4.119   Mean   : 3.942  
    3rd Qu.: 7.2463   3rd Qu.: 6.179   3rd Qu.: 5.590  
    Max.   :15.4163   Max.   :12.769   Max.   :11.857  
    T_gd_Vd2      Neutrophils_LD   T_gd_non_Vd2   
    Min.   :-2.938   Min.   :16.87   Min.   :-9.113  
    1st Qu.: 2.382   1st Qu.:41.87   1st Qu.:-4.120  
    Median : 3.447   Median :50.85   Median :-2.589  
    Mean   : 3.644   Mean   :50.70   Mean   :-2.560  
    3rd Qu.: 4.760   3rd Qu.:59.91   3rd Qu.:-1.491  
    Max.   :12.175   Max.   :80.54   Max.   : 8.840  
    Basophils_LD     Monocytes_NC_I   
    Min.   : 0.6517   Min.   :-1.2083  
    1st Qu.: 2.6431   1st Qu.: 0.2907  
    Median : 4.6014   Median : 0.9748  
    Mean   : 6.4039   Mean   : 1.2425  
    3rd Qu.: 8.5597   3rd Qu.: 1.8747  
    Max.   :46.8233   Max.   : 6.1912  
    B_Memory            mDCs         
    Min.   :-7.6440   Min.   :-0.07160  
    1st Qu.:-1.7759   1st Qu.: 0.08102  
    Median :-0.8056   Median : 0.13453  
    Mean   :-1.0176   Mean   : 0.14015  
    3rd Qu.:-0.0758   3rd Qu.: 0.19162  
    Max.   : 3.5879   Max.   : 0.46562  
      pDCs          Plasmablasts   
    Min.   :0.02063   Min.   :0.0358  
    1st Qu.:0.14775   1st Qu.:0.1508  
    Median :0.20857   Median :0.2183  
    Mean   :0.22740   Mean   :0.3574  
    3rd Qu.:0.28736   3rd Qu.:0.3611  
    Max.   :0.62854   Max.   :3.9839

As you can see, the median for three cell types (T_CD8_Naive, T_gd_non_Vd2, and B_memory) is negative. It seems reasonable to set all negative values to zero (and remove T_CD8_Naive from the analysis, due to its low minimum values).

However, an additional source of proportions (based on methylations data) is available for me for comparison's sake. I summed together similar cell types (T_CD8_Naive with T_CD8_Memory, B_Naive with B_Memory) and look at the correlation between the external source of proportions and the ABIS generated one.

The correlation (between ABIS merged cell types to the external source) is actually better if I do not set all negative values to zero. Hence, my question is :

giannimonaco commented 3 years ago

Hi, I agree that it is not good to have median negative values. With ABIS, we tried to deconvolute as many cell types as possible, and this increases the possibility of getting negative values. Reducing the results to 5-8 cell types only is a good strategy if you care about quality and not quantity. Hence, here is a series of solutions you could try out:

On Tue, 1 Jun 2021 at 09:00, sam-israel @.***> wrote:

For the sake of using the proportions generated by ABIS in cell type aware differential expression (TOAST package), I was recommended to merge similar cell types into 5-8 types only.

  1. Is there a more smart way of doing that rather than simply summing together ABIS output proportions ?
  2. The summary of ABIS generated proportions is

    Monocytes_C NK T_CD8_Memory Min. : 2.949 Min. :-3.9268 Min. : 1.674 1st Qu.: 6.622 1st Qu.: 0.9884 1st Qu.: 5.747 Median : 7.677 Median : 2.2476 Median : 8.797 Mean : 8.137 Mean : 2.4485 Mean :10.647 3rd Qu.: 9.610 3rd Qu.: 3.7038 3rd Qu.:13.069 Max. :15.787 Max. : 9.7480 Max. :48.491 T_CD4_Naive T_CD8_Naive Min. :-2.411 Min. :-25.8218 1st Qu.: 4.159 1st Qu.: -3.7227 Median : 6.610 Median : -0.4212 Mean : 7.563 Mean : -1.3039 3rd Qu.: 9.806 3rd Qu.: 2.6580 Max. :25.564 Max. : 9.5094 B_Naive T_CD4_Memory MAIT Min. : 0.5531 Min. :-4.018 Min. :-0.359 1st Qu.: 3.1769 1st Qu.: 1.671 1st Qu.: 2.001 Median : 4.6171 Median : 4.162 Median : 3.772 Mean : 5.3058 Mean : 4.119 Mean : 3.942 3rd Qu.: 7.2463 3rd Qu.: 6.179 3rd Qu.: 5.590 Max. :15.4163 Max. :12.769 Max. :11.857 T_gd_Vd2 Neutrophils_LD T_gd_non_Vd2 Min. :-2.938 Min. :16.87 Min. :-9.113 1st Qu.: 2.382 1st Qu.:41.87 1st Qu.:-4.120 Median : 3.447 Median :50.85 Median :-2.589 Mean : 3.644 Mean :50.70 Mean :-2.560 3rd Qu.: 4.760 3rd Qu.:59.91 3rd Qu.:-1.491 Max. :12.175 Max. :80.54 Max. : 8.840 Basophils_LD Monocytes_NC_I Min. : 0.6517 Min. :-1.2083 1st Qu.: 2.6431 1st Qu.: 0.2907 Median : 4.6014 Median : 0.9748 Mean : 6.4039 Mean : 1.2425 3rd Qu.: 8.5597 3rd Qu.: 1.8747 Max. :46.8233 Max. : 6.1912 B_Memory mDCs Min. :-7.6440 Min. :-0.07160 1st Qu.:-1.7759 1st Qu.: 0.08102 Median :-0.8056 Median : 0.13453 Mean :-1.0176 Mean : 0.14015 3rd Qu.:-0.0758 3rd Qu.: 0.19162 Max. : 3.5879 Max. : 0.46562 pDCs Plasmablasts Min. :0.02063 Min. :0.0358 1st Qu.:0.14775 1st Qu.:0.1508 Median :0.20857 Median :0.2183 Mean :0.22740 Mean :0.3574 3rd Qu.:0.28736 3rd Qu.:0.3611 Max. :0.62854 Max. :3.9839

As you can see, the median for three cell types (T_CD8_Naive, T_gd_non_Vd2, and B_memory) is negative. It seems reasonable to set all negative values to zero (and remove T_CD8_Naive from the analysis, due to its low minimum values).

However, an additional source of proportions (based on methylations data) is available for me for comparison's sake. I summed together similar cell types (T_CD8_Naive with T_CD8_Memory, B_Naive with B_Memory) and look at the correlation between the external source of proportions and the ABIS generated one.

The correlation is actually better if I do not set all negative values to zero. Hence, my question is :

  • Does it make sense to sum together negative and positive proportions, when merging similar cell types into one?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/18, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEG5RKVOBWVPIWTNJ5LTQSAR5ANCNFSM454AZ2OA .

sam-israel commented 3 years ago

Thanks.

giannimonaco commented 3 years ago

On Thu, 3 Jun 2021 at 10:15, sam-israel @.***> wrote:

  • So am I correct in understanding that averaging the signature matrix columns is preferable to summing the output? Could you say a bit in what sense & why

Reducing the results to 5-8 cell types only is a good strategy if you care about quality and not quantity.

  • On a different subject - another de-convolution I am performing is on microarray data; the values range from 0.25 to 13. Is it an acceptable range? Is it preferable to filter out the low values (that can indicate simply noise), or to apply some other procedure to deal with too low/too high values?

Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/giannimonaco/ABIS/issues/18#issuecomment-853678936, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC2UTEGHU7LYPAGY5FCHGS3TQ422HANCNFSM454AZ2OA .