Closed HPiyadasa closed 1 year ago
@cliu72 we should add better fallback recovery for this to the cell clustering notebook as well. Here's one way we could do it:
cluster_counts_size_norm_path
already exists, load the file at the beginning instead of calling create_c2pc_data
again. Same with weighted_cell_channel_path
.cluster_cells
, if cell_pysom.cell_data
already has a cell_som_cluster
column attached, return the data as is (and don't re-run the clustering step as we currently do). Add an explicit feather.write_dataframe
command at the end of this cell.cell_consensus_cluster
and the cell_meta_cluster
column. Add an explicit feather.write_dataframe
command at the end of this cell.Let me know if we're missing anything.
@alex-l-kong This looks good to me!
@alex-l-kong Oh actually one thought - I think it'd be good to add an explicit feather.write_dataframe
after create_c2pc_data
for the size normalized data (below where we already have that for the unnormalized data). I think in Hadeesha's experience, create_c2p2_data can take awhile, so it'd be good to have that file saved before cluster_cells
. And then we can overwrite the file at the end of cluster_cells
.
This is candace on Hadeesha's account again. @cliu72
Is your feature request related to a problem? Please describe. After the Pixie refactoring, the code was changed such that
cluster_counts_size_norm
isn't saved to a feather file until the very end of the notebook. This is a problem when users with large datasets are not able to run the entire notebook in one sitting. Becausecluster_counts_size_norm
isn't saved, there is no way to map cells to cell clusters without going through the entire notebook.Describe the solution you'd like Save
cluster_counts_size_norm
to the feather file after clustering and metaclustering are done (and before the end of the notebook).