BiomedicalMachineLearning / stLearn

A novel machine learning pipeline to analyse spatial transcriptomics data
Other
200 stars 26 forks source link

Issue with Harmonypy #236

Closed RubyLiu-2 closed 10 months ago

RubyLiu-2 commented 1 year ago

I encountered with this issue when I ran the tutorial of ST integration: 'ho = hm.run_harmony(data_mat, meta_data, vars_use='batch')' ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

Here is the data_mat[:5,:5]: array([[ 58.106743 , 6.779583 , 36.253517 , 16.418087 , 31.19908 ], [-13.641421 , -1.8859543 , -11.811447 , -3.3714347 , -10.041128 ], [ 24.498816 , 2.5082722 , 7.92704 , 8.897851 , 6.606593 ], [ 50.91102 , 5.346012 , 25.25207 , 15.650258 , 21.576591 ], [ 3.7254171 , 0.6736226 , 5.484891 , 0.54030144, 4.6993723 ]], dtype=float32)

All the previous code are same with the tutorial.

Many thanks

duypham2108 commented 1 year ago

Can you check if the data_mat has any NaN or infinity value? If yes, you should filter the components that contain those values. You also can use a smaller number of components of PCA to avoid that issue.

RubyLiu-2 commented 1 year ago

NaN or infinity value

Thanks for your reply. I think the data_mat contains infinity values. So, should I just remove those values, is there any proper way to do this.

duypham2108 commented 1 year ago

You should remove the columns that contain infinity values (check with numpy or pandas). It means you use fewer components.