earmingol / cell2cell

User-friendly tool to infer cell-cell interactions and communication from gene expression of interacting proteins
BSD 3-Clause "New" or "Revised" License
54 stars 12 forks source link

Problems with the tensor pipeline at the Tensor Factorization step #54

Closed ludicaa closed 3 weeks ago

ludicaa commented 3 months ago

Hi,

I am trying to run cell2cell tensor using GPU on my single cell data from mouse (Iigand-receptor pairs were downloaded from https://raw.githubusercontent.com/LewisLabUCSD/Ligand-Receptor-Pairs/master/Mouse/Mouse-2020-Jin-LR-pairs.csv). While I did not get errors when running your examples, I got troubles at the tensor cell2cell pipeline step below:`


tensor2 = c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
                                                     meta_tf,
                                                     copy_tensor=True, # Whether to output a new tensor or modifying the original
                                                     rank= None, # Number of factors to perform the factorization. If None, it is automatically determined by an elbow analysis
                                                     tf_optimization='regular', # To define how robust we want the analysis to be.
                                                     random_state=888, # Random seed for reproducibility
                                                     backend='pytorch', # This enables a banckend that supports using a GPU.
                                                     device='cuda', # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
                                                     elbow_metric='error', # Metric to use in the elbow analysis.
                                                     smooth_elbow=False, # Whether smoothing the metric of the elbow analysis.
                                                     upper_rank=25, # Max number of factors to try in the elbow analysis
                                                     tf_init='random', # Initialization method of the tensor factorization
                                                     tf_svd='numpy_svd', # Type of SVD to use if the initialization is 'svd'
                                                     cmaps=None, # Color palettes to use in color each of the dimensions. Must be a list of palettes.
                                                     sample_col='Element', # Columns containing the elements in the tensor metadata
                                                     group_col='Category', # Columns containing the major groups in the tensor metadata
                                                     fig_fontsize=14, # Fontsize of the figures generated
                                                     output_folder=output_folder, # Whether to save the figures and loadings in files. If so, a folder pathname must be passed
                                                     output_fig=True, # Whether to output the figures. If False, figures won't be saved a files if a folder was passed in output_folder.
                                                     fig_format='pdf', # File format of the figures.
                                                    )

At the end I only get the elbow plot with the rank. Indeed, the pipeline starts the tensor factorization but it stops with the following error:

Running Elbow Analysis
100%|██████████| 25/25 [05:41<00:00, 13.68s/it]
The rank at the elbow is: 8
Running Tensor Factorization
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[27], line 1
----> 1 tensor2 = c2c.analysis.run_tensor_cell2cell_pipeline(tensor,
      2                                                      meta_tf,
      3                                                      copy_tensor=True, # Whether to output a new tensor or modifying the original
      4                                                      rank= None, # Number of factors to perform the factorization. If None, it is automatically determined by an elbow analysis
      5                                                      tf_optimization='regular', # To define how robust we want the analysis to be.
      6                                                      random_state=888, # Random seed for reproducibility
      7                                                      backend='pytorch', # This enables a banckend that supports using a GPU.
      8                                                      device='cuda', # Device to use. If using GPU and PyTorch, use 'cuda'. For CPU use 'cpu'
      9                                                      elbow_metric='error', # Metric to use in the elbow analysis.
     10                                                      smooth_elbow=False, # Whether smoothing the metric of the elbow analysis.
     11                                                      upper_rank=25, # Max number of factors to try in the elbow analysis
     12                                                      tf_init='random', # Initialization method of the tensor factorization
     13                                                      tf_svd='numpy_svd', # Type of SVD to use if the initialization is 'svd'
     14                                                      cmaps=None, # Color palettes to use in color each of the dimensions. Must be a list of palettes.
     15                                                      sample_col='Element', # Columns containing the elements in the tensor metadata
     16                                                      group_col='Category', # Columns containing the major groups in the tensor metadata
     17                                                      fig_fontsize=14, # Fontsize of the figures generated
     18                                                      output_folder=output_folder, # Whether to save the figures and loadings in files. If so, a folder pathname must be passed
     19                                                      output_fig=True, # Whether to output the figures. If False, figures won't be saved a files if a folder was passed in output_folder.
     20                                                      fig_format='pdf', # File format of the figures.
     21                                                     )

File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/cell2cell/analysis/tensor_pipelines.py:191, in run_tensor_cell2cell_pipeline(interaction_tensor, tensor_metadata, copy_tensor, rank, tf_optimization, random_state, backend, device, elbow_metric, smooth_elbow, upper_rank, tf_init, tf_svd, cmaps, sample_col, group_col, fig_fontsize, output_folder, output_fig, fig_format, **kwargs)
    189 # Factorization
    190 print('Running Tensor Factorization')
--> 191 interaction_tensor.compute_tensor_factorization(rank=rank,
    192                                                 init=tf_init,
    193                                                 svd=tf_svd,
    194                                                 random_state=random_state,
    195                                                 runs=tf_runs,
    196                                                 normalize_loadings=True,
    197                                                 tol=tol, n_iter_max=n_iter_max,
    198                                                 **kwargs
    199                                                 )
    201 ### EXPORT RESULTS ###
    202 if output_folder is not None:

File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/cell2cell/tensor/tensor.py:361, in BaseTensor.compute_tensor_factorization(self, rank, tf_type, init, svd, random_state, runs, normalize_loadings, var_ordered_factors, n_iter_max, tol, verbose, **kwargs)
    356     self.explained_variance_ratio_ = None
    358 self.explained_variance_ = self.explained_variance()
    360 self.factors = OrderedDict(zip(order_labels,
--> 361                                [pd.DataFrame(tl.to_numpy(f), index=idx, columns=factor_names) for f, idx in zip(factors, self.order_names)]))
    362 self.rank = rank

File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/cell2cell/tensor/tensor.py:361, in <listcomp>(.0)
    356     self.explained_variance_ratio_ = None
    358 self.explained_variance_ = self.explained_variance()
    360 self.factors = OrderedDict(zip(order_labels,
--> 361                                [pd.DataFrame(tl.to_numpy(f), index=idx, columns=factor_names) for f, idx in zip(factors, self.order_names)]))
    362 self.rank = rank

File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/pandas/core/frame.py:694, in DataFrame.__init__(self, data, index, columns, dtype, copy)
    684         mgr = dict_to_mgr(
    685             # error: Item "ndarray" of "Union[ndarray, Series, Index]" has no
    686             # attribute "name"
   (...)
    691             typ=manager,
    692         )
    693     else:
--> 694         mgr = ndarray_to_mgr(
    695             data,
    696             index,
    697             columns,
    698             dtype=dtype,
    699             copy=copy,
    700             typ=manager,
    701         )
    703 # For data is list-like, or Iterable (will consume into list)
    704 elif is_list_like(data):

File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/pandas/core/internals/construction.py:351, in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
    346 # _prep_ndarray ensures that values.ndim == 2 at this point
    347 index, columns = _get_axes(
    348     values.shape[0], values.shape[1], index=index, columns=columns
    349 )
--> 351 _check_values_indices_shape_match(values, index, columns)
    353 if typ == "array":
    355     if issubclass(values.dtype.type, str):

File /opt/tools/deg/miniforge3/envs/cell2cell/lib/python3.10/site-packages/pandas/core/internals/construction.py:422, in _check_values_indices_shape_match(values, index, columns)
    420 passed = values.shape
    421 implied = (len(index), len(columns))
--> 422 raise ValueError(f"Shape of passed values is {passed}, indices imply {implied}")

ValueError: Shape of passed values is (1000, 8), indices imply (1999, 8)

If you'd like to check the object I am using here is the Dropbox link: https://www.dropbox.com/scl/fo/qq37tda3yekx0nl073xsb/ACgPNftevARe-Vo1mdt2I1Q?rlkey=0ii3b2z6awx70r4jnpgloubgd&dl=0

Thank you

earmingol commented 3 months ago

Sorry I cannot reproduce your error just with the h5ad file. Could you export you tensor and upload that file instead? Here there is an example of how doing so: https://colab.research.google.com/drive/1T6MUoxafTHYhjvenDbEtQoveIlHT2U6_#scrollTo=JD8Si50x1jq-

Also, could you check your tensor size and the length of the names you passed for each element in your tensor?

You can do this with these commands:

# Tensor shape
tensor.shape

# Length of Labels for contexts
len(tensor.order_names[0])

# Length of Labels for ligand-receptor pairs
len(tensor.order_names[1])

# Length of Labels for sender cells
len(tensor.order_names[2])

¢ Length of Labels for receiver cells
len(tensor.order_names[3])

I think your issue could be related with passing less or more labels than the actual elements in one of the tensor dimension. From the size, it could be the ligand-receptor pairs. It seems like you only provided labels for 1000 LR pairs, while your tensor has 1999 elements in total.

ludicaa commented 3 months ago

Hi Erik!

A the following link you can find the tensor and tensor metadata https://www.dropbox.com/scl/fo/qq37tda3yekx0nl073xsb/ACgPNftevARe-Vo1mdt2I1Q?rlkey=ans4a1sdnxbb3d9b77vv1kdv4&dl=0

# Tensor shape
tensor.shape

(13, 1000, 28, 28)

I have to tell you that for a reason I did not figure out, at the first attempt the Labels for ligand-receptor pairs where all capital letters (like for human) while in the ppi_names I had them in the correct format for mouse (i.e., Kdm5d^Whatever). To correct it, since it would not work at the end I just replaced

len(tensor.order_names[1]) = ppi_names #which length is 1999!!!

Thus I think that to correct the metadata creation error, I have generated this new one! Is there a way to make the tensor create the Labels for ligand-receptor pairs in the correct format??

Thank you very much for the help

Ludovica

earmingol commented 3 months ago

I see! You are using the tensor-cell2cell analysis without LIANA, right? If so, in the step of creating the interaction tensor you need to add upper_letter_comparison=False to keep the names in the original format, otherwise they will be transformed to capital letters.

For example:

tensor = c2c.tensor.InteractionTensor(rnaseq_matrices=rnaseq_matrices,
                                      ppi_data=lr_pairs,
                                      context_names=list(context_dict.keys()),
                                      how='outer',
                                      outer_fraction=0.5, # Considers elements in at least 50% of samples
                                      complex_sep='&',
                                      interaction_columns=int_columns,
                                      communication_score='expression_gmean',
                                      upper_letter_comparison=False
                                     )

Then there is no need to do tensor.order_names[1] = ppi_names

ludicaa commented 3 months ago

Yes I am using the tensor cell2cell without LIANA and yes the upper_letter_comparison=False solved the issue! Thank you for the help :)

Ludovica