bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
920 stars 167 forks source link

tutorials, code modularization #76

Open mjstrumillo opened 10 months ago

mjstrumillo commented 10 months ago

Congratulations on scGPT release! Clearly this is great research however I have real trouble following the tutorials eg. Fine-tuning on Pre-trained Model for Cell-type Annotation on datasets other than provided - are you planning further code modularization? Many users would benefit from clearer explained tutorials and slightly rewrote code (even if just for the tutorials). Do you take suggestions on this?

subercui commented 10 months ago

Hi @mjstrumillo , thank you for the great suggestion which aligns with our ongoing efforts to refactor the interfaces. Due to the limited manpower, we kindly request your patience for additional time in this process.

We noticed the need for modularization, and our current plan is to first ensure every new tutorial comes with conciseness and nice interfaces, and then gradually update the existing ones as well. For example, I have just released a new tutorial for reference mapping. Could you please let me know if you think the style and content of this new tutorial satisfy your needs? We extremely value the user feedback from you

mjstrumillo commented 10 months ago

awesome @subercui, thank you so much! oh I see the reference mapping, yes, that is very helpful. However I already ran into questions like, where do I store the faiss index files, then realised I need to download the build_index script to be able to load, and then read through it to find where are the paths to the downloaded files - which is fair, but I think this already raises the bar for who can use the code. First thing that came to my mind was comparing scGPT with CellTypist and comparing the models and showcasing finetuning - however I still did ask chatGPT to explain me every line of code and that worked great and maybe could be a bandaid for you to explain the bigger chunks of code. But I think a general explanation if there should be any general paths exported, where to store the models (or the indexes) for the examples to work would be very useful. I had the most trouble with path guessing and changing and the wandb setup for the annotation tutorial - I understand wandb is optional but if someone cant/dont want to use it then a lot of the code just becomes cloudy. But all of this looks really cool, I just cant wait to get it to work :))))

mjstrumillo commented 10 months ago

btw, which version of scanpy do you recommend? while trying the reference tutorial I experienced numba version error, and then updated scanpy and everything stopped working.

subercui commented 10 months ago

awesome @subercui, thank you so much! oh I see the reference mapping, yes, that is very helpful. However I already ran into questions like, where do I store the faiss index files, then realised I need to download the build_index script to be able to load, and then read through it to find where are the paths to the downloaded files - which is fair, but I think this already raises the bar for who can use the code. First thing that came to my mind was comparing scGPT with CellTypist and comparing the models and showcasing finetuning - however I still did ask chatGPT to explain me every line of code and that worked great and maybe could be a bandaid for you to explain the bigger chunks of code. But I think a general explanation if there should be any general paths exported, where to store the models (or the indexes) for the examples to work would be very useful. I had the most trouble with path guessing and changing and the wandb setup for the annotation tutorial - I understand wandb is optional but if someone cant/dont want to use it then a lot of the code just becomes cloudy. But all of this looks really cool, I just cant wait to get it to work :))))

Hi, regarding your questions, I may have misunderstood but I don't think you need to put anything at a specific path. Using the reference mapping tutorial as an example,

image

Here, we intended to mean that you can download the index folder, and put it anywhere you need, then input the path to index_dir=. This is the same for the model files and data files, you may put them anywhere and then update the path argument pointing to the files. Let me know if this makes sense for you, or would you suggest we explain this explicitly in notebook?

subercui commented 10 months ago

btw, which version of scanpy do you recommend? while trying the reference tutorial I experienced numba version error, and then updated scanpy and everything stopped working.

Usually, the pip would handle the versions well. The specific version we have been working with is scanpy 1.9.1 https://github.com/bowang-lab/scGPT/blob/main/poetry.lock#L2174 and numba 0.55.2 https://github.com/bowang-lab/scGPT/blob/main/poetry.lock#L1427