Starlitnightly / omicverse

A python library for multi omics included bulk, single cell and spatial RNA-seq analysis.
https://starlitnightly.github.io/omicverse/
GNU General Public License v3.0
274 stars 32 forks source link

Data lost with STAGATE_pyG #98

Closed liuxiawei closed 6 days ago

liuxiawei commented 1 week ago

Describe the bug When I use STAGATE and the error happed. The code I ran:

STA_obj=ov.space.pySTAGATE(adata,
                           num_batch_x=1,
                           num_batch_y=1,
                           spatial_key=['X','Y'],
                           rad_cutoff=75,
                device='cpu')

And I tried raw tools of STAGATE, it's all right: image

Screenshots The data processed by Batch_Data was strange. image The red line word was added for print. image The error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[10], line 8
      1 #STA_obj=ov.space.pySTAGATE(adata,num_batch_x=3,num_batch_y=2,
      2 #                 spatial_key=['X','Y'],
      3 #                           rad_cutoff=50,
   (...)
      6 #                weight_decay=1e-4,hidden_dims = [512, 30],
      7 #                device='cpu')
----> 8 STA_obj=ov.space.pySTAGATE(adata,
      9                            num_batch_x=1,
     10                            num_batch_y=1,
     11                            spatial_key=['X','Y'],
     12                            rad_cutoff=75,
     13                 device='cpu')

File /conda/envs/omicverse/lib/python3.10/site-packages/omicverse/space/_cluster.py:26, in pySTAGATE.__init__(self, adata, num_batch_x, num_batch_y, spatial_key, batch_size, rad_cutoff, num_epoch, lr, weight_decay, hidden_dims, device)
     24 for temp_adata in Batch_list:
     25     print(temp_adata)
---> 26     Cal_Spatial_Net(temp_adata, rad_cutoff=rad_cutoff)
     29 #device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
     30 data_list = [Transfer_pytorch_Data(adata) for adata in Batch_list]

File /conda/envs/omicverse/lib/python3.10/site-packages/omicverse/externel/STAGATE_pyG/utils.py:83, in Cal_Spatial_Net(adata, rad_cutoff, k_cutoff, model, verbose)
     80 coor.columns = ['imagerow', 'imagecol']
     82 if model == 'Radius':
---> 83     nbrs = sklearn.neighbors.NearestNeighbors(radius=rad_cutoff).fit(coor)
     84     distances, indices = nbrs.radius_neighbors(coor, return_distance=True)
     85     KNN_list = []

File /conda/envs/omicverse/lib/python3.10/site-packages/sklearn/base.py:1474, in _fit_context.<locals>.decorator.<locals>.wrapper(estimator, *args, **kwargs)
   1467     estimator._validate_params()
   1469 with config_context(
   1470     skip_parameter_validation=(
   1471         prefer_skip_nested_validation or global_skip_validation
   1472     )
   1473 ):
-> 1474     return fit_method(estimator, *args, **kwargs)

File /conda/envs/omicverse/lib/python3.10/site-packages/sklearn/neighbors/_unsupervised.py:175, in NearestNeighbors.fit(self, X, y)
    154 @_fit_context(
    155     # NearestNeighbors.metric is not validated yet
    156     prefer_skip_nested_validation=False
    157 )
    158 def fit(self, X, y=None):
    159     """Fit the nearest neighbors estimator from the training dataset.
    160 
    161     Parameters
   (...)
    173         The fitted nearest neighbors estimator.
    174     """
--> 175     return self._fit(X)

File /conda/envs/omicverse/lib/python3.10/site-packages/sklearn/neighbors/_base.py:518, in NeighborsBase._fit(self, X, y)
    516 else:
    517     if not isinstance(X, (KDTree, BallTree, NeighborsBase)):
--> 518         X = self._validate_data(X, accept_sparse="csr", order="C")
    520 self._check_algorithm_metric()
    521 if self.metric_params is None:

File /conda/envs/omicverse/lib/python3.10/site-packages/sklearn/base.py:633, in BaseEstimator._validate_data(self, X, y, reset, validate_separately, cast_to_ndarray, **check_params)
    631         out = X, y
    632 elif not no_val_X and no_val_y:
--> 633     out = check_array(X, input_name="X", **check_params)
    634 elif no_val_X and not no_val_y:
    635     out = _check_y(y, **check_params)

File /conda/envs/omicverse/lib/python3.10/site-packages/sklearn/utils/validation.py:1072, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
   1070     n_samples = _num_samples(array)
   1071     if n_samples < ensure_min_samples:
-> 1072         raise ValueError(
   1073             "Found array with %d sample(s) (shape=%s) while a"
   1074             " minimum of %d is required%s."
   1075             % (n_samples, array.shape, ensure_min_samples, context)
   1076         )
   1078 if ensure_min_features > 0 and array.ndim == 2:
   1079     n_features = array.shape[1]

ValueError: Found array with 0 sample(s) (shape=(0, 2)) while a minimum of 1 is required by NearestNeighbors.

Desktop (please complete the following information):

Starlitnightly commented 1 week ago

We encapsulate the preprocessing step in pySTAGATE, and this error may be reported as a result of your secondary preprocessing's error.

Do you encounter the same error when using our example data?

Zehua

liuxiawei commented 6 days ago

We encapsulate the preprocessing step in pySTAGATE, and this error may be reported as a result of your secondary preprocessing's error.

Do you encounter the same error when using our example data?

Zehua

Hi Zehua, Thanks for responese. I tried the example data. And the same error occurred. I tried to debug. I found the problem caused by code

adata.obs['X'] = adata.obsm['spatial'][:,0]
adata.obs['Y'] = adata.obsm['spatial'][:,1]
adata.obs

It caused adata.obs['X'] as NA. But I splited the code as two panel as following, the problem solved. It's very strange, but it solved .Thanks image