Open kennypavan opened 2 months ago
@kennypavan, CellTypist does not support backed mode for the time being. You could load your raw count data for example, normalize+log1p the data, subset into HVGs, write it out as a new anndata, and load it for training. Note you need to use check_expression = False and feature_selection = False for this data during training. In addition, you can also subset cells.
@ChuanXu1 Thank you for the suggestions—I'll explore if preprocessing and removing non-HVGs will work for our use case. Much appreciated!
Hello,
I'm attempting to train a large model from a AnnData object; however, memory issues persist when opening the file on our HPC with 512Gb of RAM. naturally, I've attempted to open a stream using the Anndata "backed" parameter and received the error:
This error seems reasonable as many of the aggregating functions wouldn't have access to the entire AnnData object. Increasing memory beyond 512Gb for this task is a critical resource limitation. Before attempting to mitigate this by extending the train function to support the backed mode, I'm wondering if there's a solution for processing large scale atlas level datasets with >4 million cells?
Thank you,