Poor quality of batch integration

We tried scGPT for integrating two datasets, and unfortunately, our integration is not that good and we clearly see a dataset separation after the integration. We are wondering what could be the reasons for it, and what we observe seems to match the benchmark results from this (new) preprint: https://www.biorxiv.org/content/10.1101/2023.09.08.555192v5.full.pdf

This makes me wonder whether the acclaimed "batch integration" capabilities of scLLMs are really true - could you maybe comment on why in the aforementioned paper scGPT does not perform well, is there anything obvious that you see?

We feel it might be related to the non-overlap between the datasets to integrate (on the gene level), the choice of HGVs, and the existing cell annotation that is provided (although that is not used in a zero-shot setting), in addition to the always-possible scenario that the to-be-integrated datasets are just fundamentally different with respect to cell type integration etc?

bowang-lab / scGPT

Poor quality of batch integration #244