XLearning-SCU / scBridge

MIT License
16 stars 0 forks source link

mat1 and mat2 shapes cannot be multiplied #1

Open Karobben opened 12 months ago

Karobben commented 12 months ago

Dear author, I met an error like below. Is there any parameter that could by pass this error?

Thanks

Data Loaded with the Following Configurations:
Source data: rna    Preprocess: Standard    Shape [115941, 2000]
Target data: atac   Preprocess: TFIDF   Shape [36145, 32376]
======= Training Start =======
Traceback (most recent call last):
  File "/mnt/Data/PopOS/Data_Ana/XinLi/../../NGS/scBridge/main.py", line 137, in <module>
    main(args)
  File "/mnt/Data/PopOS/Data_Ana/XinLi/../../NGS/scBridge/main.py", line 42, in main
    preds, prob_feat, prob_logit = net.run(
  File "/mnt/Data/PopOS/NGS/scBridge/model_utils.py", line 63, in run
    target_h = self.encoder(target_x)
  File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/torch/nn/modules/container.py", line 204, in forward
    input = module(input)
  File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x32376 and 2000x256)
Karobben commented 12 months ago

So, I figured out it but got a new error:

======= Training Start =======
Traceback (most recent call last):
  File "/mnt/Data/PopOS/Data_Ana/XinLi/../../NGS/scBridge/main.py", line 137, in <module>
    main(args)
  File "/mnt/Data/PopOS/Data_Ana/XinLi/../../NGS/scBridge/main.py", line 42, in main
    preds, prob_feat, prob_logit = net.run(
  File "/mnt/Data/PopOS/NGS/scBridge/model_utils.py", line 112, in run
    similarity, preds = feature_prototype_similarity(
  File "/mnt/Data/PopOS/NGS/scBridge/model_utils.py", line 197, in feature_prototype_similarity
    similarity = cosine_similarity(target_feature, source_prototypes)
  File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/sklearn/metrics/pairwise.py", line 1377, in cosine_similarity
    X, Y = check_pairwise_arrays(X, Y)
  File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/sklearn/metrics/pairwise.py", line 155, in check_pairwise_arrays
    X = check_array(
  File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/sklearn/utils/validation.py", line 899, in check_array
    _assert_all_finite(
  File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/sklearn/utils/validation.py", line 146, in _assert_all_finite
    raise ValueError(msg_err)
ValueError: Input contains NaN.
YH-Zheng commented 11 months ago

@Yunfan-Li I've encountered the same issue. I wonder if there is a solution now?

RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x606219 and 36326x256)

Yunfan-Li commented 11 months ago

@Karobben @YH-Zheng Sorry for the late reply. scBridge accepts the gene expression matrix of scRNA-seq data and the gene activity matrix of scATAC-seq data as the inputs. Common genes need to be selected before feeding into the model.

Yunfan-Li commented 11 months ago

similarity = cosine_similarity(target_feature, source_prototypes)

@Karobben Hi, could you check it is the target_feature or source_prototypes contains NaN?

YH-Zheng commented 11 months ago

@Karobben @YH-Zheng Sorry for the late reply. scBridge accepts the gene expression matrix of scRNA-seq data and the gene activity matrix of scATAC-seq data as the inputs. Common genes need to be selected before feeding into the model.

@Yunfan-Li What does common gene mean? I didn't find any tutorials to prompt me to do this step. How should I perform common gene selection?

Yunfan-Li commented 11 months ago

@YH-Zheng If you currently have the peak matrix, you need first to transform it into the activity matrix using packages such as Signac. After that, common gene selection could be done by subsampling the scRNA-seq gene count matrix and scATAC-seq gene activity matrix to have the same set of genes.

YH-Zheng commented 11 months ago

@Yunfan-Li I roughly understand, can you add this step to the tutorial? In the current tutorial, this step seems to be vague, and it does not explain that the scRNA data requires a count matrix (it seems to be re-normalized)

XLearning-SCU commented 10 months ago

Thanks for your advice, we have added the step to the README file.

welcomeyou2019 commented 10 months ago

@YH-Zheng If you currently have the peak matrix, you need first to transform it into the activity matrix using packages such as Signac. After that, common gene selection could be done by subsampling the scRNA-seq gene count matrix and scATAC-seq gene activity matrix to have the same set of genes.

I still have some questions. How to subsampling the two sequence gene? Is that mean randomly sample the two matrix with the same number?

welcomeyou2019 commented 10 months ago

@YH-Zheng If you currently have the peak matrix, you need first to transform it into the activity matrix using packages such as Signac. After that, common gene selection could be done by subsampling the scRNA-seq gene count matrix and scATAC-seq gene activity matrix to have the same set of genes.

I have check the code, and the first dimension of source and target data is the same, but the second dimension is different. From the code, the first dimension is the sample of gene, and the second is the feature dimension. So, how the code process the data? Thanks.

Yunfan-Li commented 10 months ago

The code does not require the same number of cells but common genes between scRNA-seq and scATAC-seq data.

welcomeyou2019 commented 10 months ago

Yes, but if the feature dimension of source and target is different, how to process the problem? Or, how to pre-process the data to keep the same dimension of feature? Thanks a lot.

Yunfan-Li commented 10 months ago

Common gene selection could be done by subsampling the scRNA-seq gene count matrix and scATAC-seq gene activity matrix to have the same set of genes.

welcomeyou2019 commented 10 months ago

Is that means I simply random sample the feature with high feature dimension? For example, source feature with the matrix NM, and target feature with KD, M>D, and we just sample D feature from M? Is that right. Thanks very much.

Yunfan-Li commented 10 months ago

No. You need to select common genes.

welcomeyou2019 commented 10 months ago

Ok, if we have the same samples of source and target, such as N M and N D, then we need to sample D dimension form M? Is it correct?