995884191 commented 2 weeks ago

Hi,

I'm currently working with the code related to target encoding for categorical features, and I noticed an inconsistency in the comments regarding the shapes of ct_mean and sm_mean. The original code comments state:

python

ct_mean has shape (6, 18211) and contains the means of 13 or 14 values each

sm_mean has shape (143, 18211) and contains the means of 3 or 4 values each

However, when I run the code, I find that:

ct_mean has shape (6, 50) sm_mean has shape (146, 50) It seems like the shapes might have changed due to PCA dimensionality reduction. Could you please clarify whether this is intended, or if there's something specific I should consider when interpreting these shapes?

Thank you for your help!

Ambros-M commented 2 weeks ago

Hi @995884191, thank you for this input. Because of the dimensionality reduction, the shapes are indeed (6, 50) and (146, 50), respectively. This is intended. I'll update the comments.

995884191 commented 2 weeks ago

Hi @995884191, thank you for this input. Because of the dimensionality reduction, the shapes are indeed (6, 50) and (146, 50), respectively. This is intended. I'll update the comments.

Thank You!

Dear Ambros,

I hope this message finds you well. I wanted to take a moment to express my heartfelt gratitude for your incredible work on Py-boost. As a graduate student, I have found your contributions to be extremely inspiring and valuable to my research.

Your timely responses to my questions have greatly helped me navigate some challenges, and I truly appreciate the support you provide to the community. Thank you once again for all your hard work and dedication!

Ambros-M / Single-Cell-Perturbations-2023

About shape of ct_mean shape #1

ct_mean has shape (6, 18211) and contains the means of 13 or 14 values each

sm_mean has shape (143, 18211) and contains the means of 3 or 4 values each