bowang-lab / scGPT

https://scgpt.readthedocs.io/en/latest/
MIT License
1.01k stars 196 forks source link

Questions about Building the SCB Files #86

Closed Mang30 closed 1 year ago

Mang30 commented 1 year ago

First of all, congratulations on your groundbreaking work on scGPT. However, I have a question regarding my use of Build the scb Files. For example, I downloaded 9 h5ad files about heart from cellxgene, but after building the scb files using Build the scb Files, only 5 scb files were generated. This has left me perplexed. Have you ever encountered such a situation?

subercui commented 1 year ago

Hi, thanks for the question. If you are using this script, some files may get skipped between the following lines: https://github.com/bowang-lab/scGPT/blob/c1d2101188b8c3d1c1269067f4c48fd3419b7192/data/cellxgene/build_large_scale_data.py#L181-L210

So this "try except" would skip the data file that can not be properly processed, but you should see the traceback messages printed out.

Meanwhile, I would say if you don't have super large datasets, you probably don't need this. Building SCB is particularly designed for accelerating the data processing in pretraining where you can not fit all the data in memory. If you have large enough CPU memory, you may just load all the data and follow one of the tutorial notebooks that fits your application.

I hope the explanations make sense to you.

Mang30 commented 1 year ago

Thank you very much for your reply. Your response has resolved some of my doubts. However, I still want to confirm something: when constructing SCB files, is it normal if some H5AD files are not converted into SCB files? Do I need to convert these unconverted H5AD files back into SCB files again? In other words, is there any operation that can convert the unconverted H5AD files back into SCB files? Thank you again for your reply

subercui commented 1 year ago

Yes, it should be able to convert your files in general. Did you check the log messages when you ran the script? As I mentioned in the previous comment, did you see any traceback messages printed out? You may copy the message here and I can help check the cause for skipping those files

Mang30 commented 1 year ago

Yes, it should be able to convert your files in general. Did you check the log messages when you ran the script? As I mentioned in the previous comment, did you see any traceback messages printed out? You may copy the message here and I can help check the cause for skipping those files

I apologize for the late response, but regarding this issue, I believe I have found the reasons: 1. The connection to cellxgene is unstable when downloading data, which causes errors in the downloaded partition data, so the chunk cannot be converted to an scb type file.2. Insufficient computer memory prevents all h5ad files from being converted to SCB files.

subercui commented 1 year ago

Thank you for the updates. Let me know if any further question