bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
1.03k stars 204 forks source link

About pretrain code #40

Closed wyhsleep closed 1 year ago

wyhsleep commented 1 year ago

May I ask when the pretrain code will be released? I would like to try training it on my own dataset.🥹🥹🥹

subercui commented 1 year ago

Hi, we are currently working on the multi-omics tutorial setup and relaxing the dependency requirements, since there are several issues reporting about the flash-attn installation. Once these are completed, we'll have time to wrap up and release the code related to pretraining and data collection. Hopefully, that will be completed by next weekend.

In the meantime, how large is your dataset? If you have less than 100K cells, I'll recommend you try the "finetuning-integration" pipeline following the example code. It can usually work well as self-supervised training and should generate dataset-specific cell and gene embeddings.

Thank you!

wyhsleep commented 1 year ago

Oh that's great, thank you very much for your work!