SuperBianC / scMulan

Repository for paper scMulan: a multitask generative pre-trained language model for single-cell analysis.
MIT License
31 stars 4 forks source link

Batch processing #3

Open agemagician opened 4 months ago

agemagician commented 4 months ago

Hello,

Thanks for your great work.

I have noticed that you don't use batch processing in your "get_cell_types_for_adata" function, which makes the feature extraction process very slow.

Compared to the scGPT, which processes inputs in batches, it is 24 times faster.

Do you have any plans to support batch processing ?

SuperBianC commented 4 months ago

@agemagician Thanks a lot for your suggestion. I have been working on batch processing of scMulan. However, I found it's difficult to use batch processing, because if the input cells have different length (the number of expressed genes), the decoder only architecture could not process them as a batch. I have two possible solutions. First is sampling cells with same length as a batch from the dataloader. But it takes a long time to return cell type results in the order of the original adata file indexes. Second is to pad the cells in a batch as a same length. But the trade-off is the generation steps would only be determined by the longest cells, thus the short cells in the batch would waste extra computation time.

I have tried the first solution. It doesn't show any acceleration.

Do you have any ideas for this?

Thanks again.