Closed alex-l-kong closed 11 months ago
Looks good.
Out of curiosity though, where was this causing an error? From a quick glance at the code, I couldn't figure out where the error would come up.
In
assign_som_clusters
(https://github.com/angelolab/ark-analysis/blob/main/src/ark/phenotyping/cluster_helpers.py#L260-L278), it looks like only the column names that were in the normalization/weights file would be used. Like the functionsnormalize_data
only uses columns from the normalization values, andgenerate_som_clusters
only uses columns from the weights. Ingenerate_som_cluseters
:cluster_labels.append(map_data_to_nodes( self.weights.values.astype(np.float64), external_data.loc[ i:min(i + 99, external_data.shape[0]), weights_cols ].values.astype(np.float64)
Because of the
weight_cols
, it seems to me that even if the 'pixel_som_cluster' column was still in the data file, it would be ignored.
You're right that leaving in the SOM cluster column shouldn't inherently cause problems. Still not not entirely sure why this was an issue with Patricia's run in particular, it's nice to have as an extra check.
After taking another look at the process it seems there's another problem: when the user needs to overwrite the data in pixel_mat_data
, it shouldn't go through an additional round of normalization each time, which is what's actually happening at the moment. We'll need to propagate this further into the actual PixelSOMCluster
and PixieSOMCluster
classes so we don't renormalize if the overwrite
flag is set to True
.
Good catch! Perhaps I'm losing it, but I think that the logic is a bit off.
If
overwrite=True
, then we do not need to re-normalize, sonormalize_data
inassign_som_clusters
should be False right? Right now, since we're passingnormalize_data=overwrite
,normalize_data
would be True.
Good catch, got a bit lost in the horrors of control flow logic negation.
What is the purpose of this PR?
The overwrite flag wasn't correctly dropping the
"pixel_som_cluster"
column, which will cause problems during assignment since it will attempt to use that column as an actual feature when mapping onto the SOM.How did you implement your changes
Pass the
overwrite
flag intorun_pixel_som_assignment
.