Open Karobben opened 1 year ago
So, I figured out it but got a new error:
======= Training Start =======
Traceback (most recent call last):
File "/mnt/Data/PopOS/Data_Ana/XinLi/../../NGS/scBridge/main.py", line 137, in <module>
main(args)
File "/mnt/Data/PopOS/Data_Ana/XinLi/../../NGS/scBridge/main.py", line 42, in main
preds, prob_feat, prob_logit = net.run(
File "/mnt/Data/PopOS/NGS/scBridge/model_utils.py", line 112, in run
similarity, preds = feature_prototype_similarity(
File "/mnt/Data/PopOS/NGS/scBridge/model_utils.py", line 197, in feature_prototype_similarity
similarity = cosine_similarity(target_feature, source_prototypes)
File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/sklearn/metrics/pairwise.py", line 1377, in cosine_similarity
X, Y = check_pairwise_arrays(X, Y)
File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/sklearn/metrics/pairwise.py", line 155, in check_pairwise_arrays
X = check_array(
File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/sklearn/utils/validation.py", line 899, in check_array
_assert_all_finite(
File "/mnt/Data/PopOS/miniconda/envs/scBridge/lib/python3.10/site-packages/sklearn/utils/validation.py", line 146, in _assert_all_finite
raise ValueError(msg_err)
ValueError: Input contains NaN.
@Yunfan-Li I've encountered the same issue. I wonder if there is a solution now?
RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x606219 and 36326x256)
@Karobben @YH-Zheng Sorry for the late reply. scBridge accepts the gene expression matrix of scRNA-seq data and the gene activity matrix of scATAC-seq data as the inputs. Common genes need to be selected before feeding into the model.
similarity = cosine_similarity(target_feature, source_prototypes)
@Karobben Hi, could you check it is the target_feature or source_prototypes contains NaN?
@Karobben @YH-Zheng Sorry for the late reply. scBridge accepts the gene expression matrix of scRNA-seq data and the gene activity matrix of scATAC-seq data as the inputs. Common genes need to be selected before feeding into the model.
@Yunfan-Li What does common gene mean? I didn't find any tutorials to prompt me to do this step. How should I perform common gene selection?
@YH-Zheng If you currently have the peak matrix, you need first to transform it into the activity matrix using packages such as Signac. After that, common gene selection could be done by subsampling the scRNA-seq gene count matrix and scATAC-seq gene activity matrix to have the same set of genes.
@Yunfan-Li I roughly understand, can you add this step to the tutorial? In the current tutorial, this step seems to be vague, and it does not explain that the scRNA data requires a count matrix (it seems to be re-normalized)
Thanks for your advice, we have added the step to the README file.
@YH-Zheng If you currently have the peak matrix, you need first to transform it into the activity matrix using packages such as Signac. After that, common gene selection could be done by subsampling the scRNA-seq gene count matrix and scATAC-seq gene activity matrix to have the same set of genes.
I still have some questions. How to subsampling the two sequence gene? Is that mean randomly sample the two matrix with the same number?
@YH-Zheng If you currently have the peak matrix, you need first to transform it into the activity matrix using packages such as Signac. After that, common gene selection could be done by subsampling the scRNA-seq gene count matrix and scATAC-seq gene activity matrix to have the same set of genes.
I have check the code, and the first dimension of source and target data is the same, but the second dimension is different. From the code, the first dimension is the sample of gene, and the second is the feature dimension. So, how the code process the data? Thanks.
The code does not require the same number of cells but common genes between scRNA-seq and scATAC-seq data.
Yes, but if the feature dimension of source and target is different, how to process the problem? Or, how to pre-process the data to keep the same dimension of feature? Thanks a lot.
Common gene selection could be done by subsampling the scRNA-seq gene count matrix and scATAC-seq gene activity matrix to have the same set of genes.
Is that means I simply random sample the feature with high feature dimension? For example, source feature with the matrix NM, and target feature with KD, M>D, and we just sample D feature from M? Is that right. Thanks very much.
No. You need to select common genes.
Ok, if we have the same samples of source and target, such as N M and N D, then we need to sample D dimension form M? Is it correct?
Dear author, I met an error like below. Is there any parameter that could by pass this error?
Thanks