Inconsistent Data Preprocessing in Tutorial_ZeroShot_Integration.ipynb

bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/

MIT License

1.05k stars 206 forks source link

Inconsistent Data Preprocessing in Tutorial_ZeroShot_Integration.ipynb #270

Open nc1m opened 1 week ago

nc1m commented 1 week ago

Hi,

In the Tutorial_ZeroShot_Integration.ipynb tutorial/example, the COVID-19 and Lung-Kim datasets are used, which appear to contain raw counts. However, the model expects normalized and log1p transformed values. Is there a reason why these datasets are not preprocessed to match the model's input requirements?

subercui commented 3 days ago

Hi, thank you for the question. The input to the model is actually the binned values, so actually these preprocessing including log1p should not matter.