YangLabHKUST / SpatialScope

A unified approach for integrating spatial and single-cell transcriptomics data by leveraging deep generative models
https://spatialscope-tutorial.readthedocs.io/en/latest/
GNU General Public License v3.0
43 stars 4 forks source link

Cannot reproduce the imputation results of SpatialScope #3

Open HelloWorldLTY opened 7 months ago

HelloWorldLTY commented 7 months ago

Hi, I found a bug in the tutorial for running imputation based on MERFISH dataset:

image

Moreover, I wonder why we need to set different arrays for running this step:

image

Thanks a lot.

JiaShun-Xiao commented 7 months ago

Hi, I found a bug in the tutorial for running imputation based on MERFISH dataset:

image

Moreover, I wonder why we need to set different arrays for running this step:

image

Thanks a lot.

Hi, Thanks for reporting this bug. It seems that we have made some updates to the ConcatCells function. Now, we have updated the tutorial notebook accordingly. Please use git pull to update your local repository.

The reason why we set different arrays to run this step is due to the GPU memory limitation, we recommend handle 1000 spots at a time. e.g., 0,1000 means 0 to 1000-th spot

Thanks, Jiashun

HelloWorldLTY commented 7 months ago

Hi, thanks for your answers. I have a furhter question in the annotation step prior this instruction:

2023-12-22 00:00:33,125 INFO worker.py:1518 -- Started a local Ray instance.
Traceback (most recent call last):
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 305, in <module>
    CTI.CellTypeIdentification(nu = args.nu, n_neighbo = args.n_neighbo, hs_ST = args.hs_ST, VisiumCellsPlot = args.VisiumCellsPlot, UMI_min_sigma = args.UMI_min_sigma)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 167, in CellTypeIdentification
    self.WarmStart(hs_ST=hs_ST, UMI_min_sigma = UMI_min_sigma)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 93, in WarmStart
    self.LoadLikelihoodTable()
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 141, in LoadLikelihoodTable
    Q1[str(i + 10)] = np.reshape(np.array(lines[i].split(' ')).astype(np.float), (2536, 103)).T
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/numpy/__init__.py", line 324, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

It seems that the pacakges used in the github do not matched its experimental version.

Moreover, I am curious about the device requirement for running this model for imputation. I only have one GPU with 40 GB memory, so is it possible for me to run your model for imputation? It seems that I need to call multiple GPU cores for imputation. Thanks.

JiaShun-Xiao commented 7 months ago

Hi, thanks for your answers. I have a furhter question in the annotation step prior this instruction:

2023-12-22 00:00:33,125 INFO worker.py:1518 -- Started a local Ray instance.
Traceback (most recent call last):
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 305, in <module>
    CTI.CellTypeIdentification(nu = args.nu, n_neighbo = args.n_neighbo, hs_ST = args.hs_ST, VisiumCellsPlot = args.VisiumCellsPlot, UMI_min_sigma = args.UMI_min_sigma)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 167, in CellTypeIdentification
    self.WarmStart(hs_ST=hs_ST, UMI_min_sigma = UMI_min_sigma)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 93, in WarmStart
    self.LoadLikelihoodTable()
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 141, in LoadLikelihoodTable
    Q1[str(i + 10)] = np.reshape(np.array(lines[i].split(' ')).astype(np.float), (2536, 103)).T
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/numpy/__init__.py", line 324, in __getattr__
    raise AttributeError(__former_attrs__[attr])
AttributeError: module 'numpy' has no attribute 'float'.
`np.float` was a deprecated alias for the builtin `float`. To avoid this error in existing code, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
    https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

It seems that the pacakges used in the github do not matched its experimental version.

Moreover, I am curious about the device requirement for running this model for imputation. I only have one GPU with 40 GB memory, so is it possible for me to run your model for imputation? It seems that I need to call multiple GPU cores for imputation. Thanks.

Hi, thanks for reporting this bug, we ignored the DeprecationWarning of np.float before. Now we have updated the np.float to np.float64

The minimum GPU requirement for SpatialScope is 2080 Ti (12GB). It's okay to use only one GPU; multiple GPUs were intended to speed up the imputation process. However, limited by GPU memory, we recommend imputing 1000 cells at a time when 40 GB of memory is available.

Thanks Jiashun

HelloWorldLTY commented 7 months ago

Hi, thanks for your answer. I further meet another problem

2023-12-24 16:17:46,625 : INFO : fitBulk: decomposing bulk
2023-12-24 16:17:47,345 : INFO : chooseSigma: using initial Q_mat with sigma = 1.0
Traceback (most recent call last):
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 305, in <module>
    CTI.CellTypeIdentification(nu = args.nu, n_neighbo = args.n_neighbo, hs_ST = args.hs_ST, VisiumCellsPlot = args.VisiumCellsPlot, UMI_min_sigma = args.UMI_min_sigma)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 167, in CellTypeIdentification
    self.WarmStart(hs_ST=hs_ST, UMI_min_sigma = UMI_min_sigma)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 121, in WarmStart
    myRCTD = run_RCTD(myRCTD, self.Q_mat_all, self.X_vals_loc, loggings = self.loggings)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/utils_pyRCTD.py", line 34, in run_RCTD
    RCTD = choose_sigma_c(RCTD, Q_mat_all, X_vals_loc, loggings = loggings)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/utils_pyRCTD.py", line 430, in choose_sigma_c
    results = decompose_batch(np.array(puck['nUMI'].loc[fit_ind]).squeeze(), RCTD['cell_type_info']['renorm']['cell_type_means'], beads, RCTD['internal_vars']['gene_list_reg'], constrain = False, max_cores = RCTD['config']['max_cores'], loggings = loggings,likelihood_vars = likelihood_vars)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/utils_pyRCTD.py", line 234, in decompose_batch
    weights = ray.get([decompose_full_ray.remote(arg) for arg in inp_args])
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/utils_pyRCTD.py", line 234, in <listcomp>
    weights = ray.get([decompose_full_ray.remote(arg) for arg in inp_args])
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/remote_function.py", line 121, in _remote_proxy
    return self._remote(args=args, kwargs=kwargs, **self._default_options)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 307, in _invocation_remote_span
    return method(self, args, kwargs, *_args, **_kwargs)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/remote_function.py", line 393, in _remote
    return invocation(args, kwargs)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/remote_function.py", line 369, in invocation
    object_refs = worker.core_worker.submit_task(
  File "python/ray/_raylet.pyx", line 1536, in ray._raylet.CoreWorker.submit_task
  File "python/ray/_raylet.pyx", line 1540, in ray._raylet.CoreWorker.submit_task
  File "python/ray/_raylet.pyx", line 385, in ray._raylet.prepare_args_and_increment_put_refs
  File "python/ray/_raylet.pyx", line 376, in ray._raylet.prepare_args_and_increment_put_refs
  File "python/ray/_raylet.pyx", line 418, in ray._raylet.prepare_args_internal
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/_private/worker.py", line 536, in get_serialization_context
    context_map[job_id] = serialization.SerializationContext(self)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/_private/serialization.py", line 124, in __init__
    serialization_addons.apply(self)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/util/serialization_addons.py", line 56, in apply
    register_pydantic_serializer(serialization_context)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/util/serialization_addons.py", line 19, in register_pydantic_serializer
    pydantic.fields.ModelField,
AttributeError: module 'pydantic.fields' has no attribute 'ModelField'

It seems that the pydantic provided in the installing process does not match your experiment environment. I wonder if I can access the version of pydantic.

Moreover, I wonder if I have cell types for both scRNA-seq and spatial data, can I skip this step (cell-type-identification) for imputation? Thanks.

JiaShun-Xiao commented 6 months ago

Hi, thanks for your answer. I further meet another problem

2023-12-24 16:17:46,625 : INFO : fitBulk: decomposing bulk
2023-12-24 16:17:47,345 : INFO : chooseSigma: using initial Q_mat with sigma = 1.0
Traceback (most recent call last):
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 305, in <module>
    CTI.CellTypeIdentification(nu = args.nu, n_neighbo = args.n_neighbo, hs_ST = args.hs_ST, VisiumCellsPlot = args.VisiumCellsPlot, UMI_min_sigma = args.UMI_min_sigma)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 167, in CellTypeIdentification
    self.WarmStart(hs_ST=hs_ST, UMI_min_sigma = UMI_min_sigma)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Cell_Type_Identification.py", line 121, in WarmStart
    myRCTD = run_RCTD(myRCTD, self.Q_mat_all, self.X_vals_loc, loggings = self.loggings)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/utils_pyRCTD.py", line 34, in run_RCTD
    RCTD = choose_sigma_c(RCTD, Q_mat_all, X_vals_loc, loggings = loggings)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/utils_pyRCTD.py", line 430, in choose_sigma_c
    results = decompose_batch(np.array(puck['nUMI'].loc[fit_ind]).squeeze(), RCTD['cell_type_info']['renorm']['cell_type_means'], beads, RCTD['internal_vars']['gene_list_reg'], constrain = False, max_cores = RCTD['config']['max_cores'], loggings = loggings,likelihood_vars = likelihood_vars)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/utils_pyRCTD.py", line 234, in decompose_batch
    weights = ray.get([decompose_full_ray.remote(arg) for arg in inp_args])
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/utils_pyRCTD.py", line 234, in <listcomp>
    weights = ray.get([decompose_full_ray.remote(arg) for arg in inp_args])
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/remote_function.py", line 121, in _remote_proxy
    return self._remote(args=args, kwargs=kwargs, **self._default_options)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/util/tracing/tracing_helper.py", line 307, in _invocation_remote_span
    return method(self, args, kwargs, *_args, **_kwargs)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/remote_function.py", line 393, in _remote
    return invocation(args, kwargs)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/remote_function.py", line 369, in invocation
    object_refs = worker.core_worker.submit_task(
  File "python/ray/_raylet.pyx", line 1536, in ray._raylet.CoreWorker.submit_task
  File "python/ray/_raylet.pyx", line 1540, in ray._raylet.CoreWorker.submit_task
  File "python/ray/_raylet.pyx", line 385, in ray._raylet.prepare_args_and_increment_put_refs
  File "python/ray/_raylet.pyx", line 376, in ray._raylet.prepare_args_and_increment_put_refs
  File "python/ray/_raylet.pyx", line 418, in ray._raylet.prepare_args_internal
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/_private/worker.py", line 536, in get_serialization_context
    context_map[job_id] = serialization.SerializationContext(self)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/_private/serialization.py", line 124, in __init__
    serialization_addons.apply(self)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/util/serialization_addons.py", line 56, in apply
    register_pydantic_serializer(serialization_context)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/ray/util/serialization_addons.py", line 19, in register_pydantic_serializer
    pydantic.fields.ModelField,
AttributeError: module 'pydantic.fields' has no attribute 'ModelField'

It seems that the pydantic provided in the installing process does not match your experiment environment. I wonder if I can access the version of pydantic.

Moreover, I wonder if I have cell types for both scRNA-seq and spatial data, can I skip this step (cell-type-identification) for imputation? Thanks.

Hi,

As many reported issues are related to the environment, we have provided a docker image (docker pull xiaojs95/spatialscope) to avoid installation problems. See the project homepage for more details if needed.

if cell types for both scRNA-seq and spatial data are available, (cell-type-identification) can be skipped as long as the cell types are matched beween scRNA-seq and spatial data.

Thanks, Jiashun

HelloWorldLTY commented 6 months ago

Thanks a lot. I will try it and back to you. It seems that you have provided the version of pydantic.

Moreover, it seems that the docker file contains the information for installing, same as the environment.yml file provided in your repo, I think it does not make difference if I initially chose to install it based on environment.yml.

RUN apt-get update && apt-get install -y git rsync

# Clone the repository from GitHub
RUN git clone https://github.com/YangLabHKUST/SpatialScope.git

RUN cd SpatialScope

WORKDIR /home/SpatialScope

# Create and activate the Conda environment
RUN conda env create -f environment.yml # This is the exact step I run.
HelloWorldLTY commented 6 months ago

Thanks, after updating pydantic, I addressed my problems of running annotation.

However, there seems like another problem of imputing step:

+ arr=("0,1000" "1000,2000" "2000,3000" "3000,4000" "4000,5000" "5000,5551")
+ declare -a arr
+ for i in "${arr[@]}"
+ python ./src/Decomposition.py
Traceback (most recent call last):
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Decomposition.py", line 444, in <module>
    DECOM = GeneExpDecomposition(config)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Decomposition.py", line 29, in __init__
    self.out_dir = os.path.join(self.config.data.out_dir, self.config.data.tissue)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

image

Here is the error. I directly copied the codes from the tutorial. I did have the path shown above. Could you please help me? Thanks.

JiaShun-Xiao commented 6 months ago

Thanks, after updating pydantic, I addressed my problems of running annotation.

However, there seems like another problem of imputing step:

+ arr=("0,1000" "1000,2000" "2000,3000" "3000,4000" "4000,5000" "5000,5551")
+ declare -a arr
+ for i in "${arr[@]}"
+ python ./src/Decomposition.py
Traceback (most recent call last):
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Decomposition.py", line 444, in <module>
    DECOM = GeneExpDecomposition(config)
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/./src/Decomposition.py", line 29, in __init__
    self.out_dir = os.path.join(self.config.data.out_dir, self.config.data.tissue)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/posixpath.py", line 76, in join
    a = os.fspath(a)
TypeError: expected str, bytes or os.PathLike object, not NoneType

image

Here is the error. I directly copied the codes from the tutorial. I did have the path shown above. Could you please help me? Thanks.

This is due to markdown display problem, add '\' in the end of each line, except for the last line image

HelloWorldLTY commented 6 months ago

Thanks a lot. After fixing this bug, I meet a new error:

(SpatialScope)[tl688@r208u22n01.mccleary SpatialScope]$ bash imputation.sh 
+ arr=("0,1000" "1000,2000" "2000,3000" "3000,4000" "4000,5000" "5000,5551")
+ declare -a arr
+ for i in "${arr[@]}"
+ python src/Decomposition.py --tissue merfish --out_dir ./output --SC_Data ./Ckpts_scRefs/MOp/Ref_snRNA_mop_qc3_2Kgenes.h5ad
2024-01-25 00:43:53,013 : INFO : load scRNA-seq reference: ./Ckpts_scRefs/MOp/Ref_snRNA_mop_qc3_2Kgenes.h5ad
Traceback (most recent call last):
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'cell_type'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/Decomposition.py", line 449, in <module>
    DECOM.decomposition()
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/Decomposition.py", line 366, in decomposition
    self.LoadScData()
  File "/gpfs/gibbs/pi/zhao/tl688/SpatialScope/src/Decomposition.py", line 49, in LoadScData
    cell_type_array = np.array(self.sc_data_process_marker.obs[self.config.data.cell_class_column])
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/gpfs/gibbs/project/zhao/tl688/conda_envs/SpatialScope/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 'cell_type'

I think there exist mismatched information between the cell_class_column and your key for markers. I used the most updated codes. Thanks.

JiaShun-Xiao commented 6 months ago

arguments after --SC_Data were missing

HelloWorldLTY commented 5 months ago

Thanks a lot, now the training process worked for me.