cliffren / PENCIL

PENCIL is a novel tool for singlecell data analysis to identify phenotype associated subpopulations and informative genes simultaneously.
GNU General Public License v3.0
25 stars 1 forks source link

Low precision 'with rejection' #6

Closed DomenicoSkyWalker89 closed 1 year ago

DomenicoSkyWalker89 commented 1 year ago

Hello guys,

I found some insistences after PENCIL analysis. Briafly, I divided my cells into two groups R and NR, but after ran of PENCIL I observed a reductin of precision with rejection for the R group but not for the NR (attached image). Image_1

Furthermore plotting 'predicted_labels' were colored only NR and rejected cells but not the R ones. Rplot

Have you any suggestions to improve R precision and have some cells into the image? Than you very much for your help.

Best,

Domenico

PENCIL parameters (I modified the shuffle rate to reduce the rejection):

shuffle_rate=1/4, lambda_L1=1e-5, lambda_L2=1e-3, lr=0.01,

cliffren commented 1 year ago

Hi Domenico,

This dataset looks pretty hard, I think you can try it with the following two parameters:

  1. _classweights, try to set the weights of the R-group to around 1.5 based on the cell numbers;
  2. _lambdaL1, which you can try increasing to 1e-4, 1e-3 or decreasing to 1e-6.

I hope that helps.

Best, Tao Ren

DomenicoSkyWalker89 commented 1 year ago

Hi Tao Ren,

Thanks for the suggestions. I resolved at least in part the problem reaching now the following result that was not to bad:

Number of examples rejected= 461314 / 959396 num_of_rejcted NR 238494 R 222820 Name: count, dtype: int64 --- without rejection --- precision recall f1-score support

      NR       0.70      0.79      0.74    552881
       R       0.66      0.55      0.60    406515

accuracy                           0.69    959396

macro avg 0.68 0.67 0.67 959396 weighted avg 0.68 0.69 0.68 959396

--- with rejection --- precision recall f1-score support

      NR       0.81      0.89      0.85    314387
       R       0.77      0.75      0.71    183695

accuracy                           0.80    498082

macro avg 0.79 0.77 0.78 498082 weighted avg 0.80 0.80 0.80 498082

---test time: 5.945091724395752 seconds ---

ps sorry but I missed to say that these data did not arise from scRNA-seq but from flow and mass cytometry.

Best,

Domenico