TencentAILabHealthcare / spatialID

32 stars 4 forks source link

Script for Stereoseq cell annotation #2

Closed UmaSangumathi closed 1 year ago

UmaSangumathi commented 1 year ago

Hi,

Thanks for the package, I am trying to apply this to Stereo-seq data. But, I am unable to find the python script to run cell annotation for Stereo-seq data. Please do let me know where I can find the script.

Best, Uma

SilversH commented 1 year ago

@UmaSangumathi Hi, if you have seen the source code of the three python scripts, you can find they are almost the same except for the accuracy computation part and some parameters setting. So if you are trying to run cell annotation for Stereoseq-seq data, you can simply use one of other scripts(e.g. cell_type_annotation_for_merfish.py) and set 'dataset' to 'stereoseq'.

UmaSangumathi commented 1 year ago

Hi @SilversH, Thanks so much for your reply and I have set the dataset parameter to Stereoseq. However, I am getting errors in loading the cell_type_annotation_model.pyc .

from cell_type_annotation_model import DNNModel, SpatialModelTrainer ImportError: bad magic number in 'cell_type_annotation_model': b'U\r\r\n'

Please do let me know how to rectify this.

SilversH commented 1 year ago

Hi @UmaSangumathi, it seems like an OS or python version problem to me. Are you using a windows operating system? I tried to run the scripts on a win10 machine, getting a similar error.

UmaSangumathi commented 1 year ago

Hi @SilversH, I am using Ubuntu and can you please let me know which python version is best suited to run this code? Thanks

$ python --version Python 3.9.12

$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.5 LTS Release: 20.04 Codename: focal

SilversH commented 1 year ago

Hi @UmaSangumathi, I have test python 3.9.12 and got the same error as you. Please try python 3.8.8 as described in readme. image

UmaSangumathi commented 1 year ago

Hi, Yes, by switching to python 3.8.8 I can now load and run the script. However, I am facing another issue in the Model training step. Please let do let me know how to fix this. Or can you please provide me with the link to the "Stereoseq dataset/object" you used in this paper, so I can try running a test (I could not find it on this GitHub page)

==> Loading data... Data name: sample1_out (Stereoseq) Data path: dataset/Stereoseq/ Save path: result/Stereoseq/

==> Preprocessing... Parameters(filter_mt=True, cell_min_counts=100, gene_min_cells=10, cell_max_counts_percent=98.0, drop_rate=0) python3.8/site-packages/pandas/core/arraylike.py:402: RuntimeWarning: invalid value encountered in log1p result = getattr(ufunc, method)(*inputs, kwargs) python3.8/site-packages/pandas/core/arraylike.py:402: RuntimeWarning: invalid value encountered in log1p result = getattr(ufunc, method)(*inputs, *kwargs) python3.8/site-packages/pandas/core/arraylike.py:402: RuntimeWarning: invalid value encountered in log1p result = getattr(ufunc, method)(inputs, kwargs) sample1_out: 8789 cells × 16921 genes.

==> Transfering from sc-dataset... Parameters(dnn_model=dnn_model/checkpoint_MERFISH_s.t7, gpu=0, batch_size=4096)

==> Model training... Parameters(pca_dim=200, k_graph=30, edge_weight=True, kd_T=1, feat_dim=64, w_dae=1.0, w_gae=1.0, w_cls=10.0, epochs=200) python3.8/site-packages/scanpy/preprocessing/_simple.py:352: RuntimeWarning: invalid value encountered in log1p np.log1p(X, out=X)

Intel MKL ERROR: Parameter 4 was incorrect on entry to SLASCL.

Intel MKL ERROR: Parameter 4 was incorrect on entry to SLASCL. Traceback (most recent call last): File "./cell_type_annotation_for_stereoseq.py", line 249, in spatial_classification_tool(config, args.data_name) File "./cell_type_annotation_for_stereoseq.py", line 188, in spatial_classification_tool u, s, v = torch.pca_lowrank(gene_mat, config['train']['pca_dim']) File "python3.8/site-packages/torch/_lowrank.py", line 277, in pca_lowrank return _svd_lowrank(A - C, q, niter=niter, M=None) File "python3.8/site-packages/torch/_lowrank.py", line 157, in _svd_lowrank U, S, V = torch.svd(B_t) RuntimeError: svd_cpu: the updating process of SBDSDC did not converge (error: 23)

Thanks so much, Uma

SilversH commented 1 year ago

Hi @UmaSangumathi , according to "RuntimeWarning: invalid value encountered in log1p", the problem seems to be negative values in the expression matrix. In our experiments, the Stereo-seq data we used is raw data (raw UMI count matrix), different from other three opensource dataset. So when you set parameter dataset to 'Stereo-seq' please input raw UMI count matrices. If you want to input preprossed data matrices, you can just set 'dataset' to 'MERFISH' during test to skip our preprocess step.

The Stereo-seq dataset we used involve animal testing, so we do not put it here, you can find the opensource link in our paper.

UmaSangumathi commented 1 year ago

Hi @SilversH, Thanks so much. I can run the script with raw Stereoseq data.