Closed pozel closed 10 months ago
Hi! Thanks for using A-SOiD. The file format is used to save some internal information and the gui has no feature to export directly. however, the results of the clustering can be exported in the directed discovery step.
If you want to open the sav
file, you can use this code snippet to do it in python:
import joblib
path_to_sav = r"FULL/PATH/EMBEDDING.sav"
with open(path_to_sav, 'rb') as fr:
[umap_embeddings, assignments, soft_assignments, pred_assign] = joblib.load(fr)
Each parameter is a dictionary of the following structure:
target_behaviors = ["grooming", "sniffing", "turn", "locomotion"]
umap_embeddings = {key: [] for key in target_behaviors}
assignments = {key: [] for key in target_behaviors}
soft_assignments = {key: [] for key in target_behaviors}
pred_assign = {key: [] for key in target_behaviors}
so you can take the directed discovery results from each behavior seperate by using the target_behavior
name as a key.
target_behavior = "grooming"
umap_embedd_groom = umap_embeddings[target_behavior]
pred_assign_groom = umap_embeddings[target_behavior]
The assignments are a label (0-n_clusters) per row. the embeddings are the multidimensional embedding based on the features. Note that your entire data is concatenated in there, so differentiating between input sessions is not possible without backtracing the feauture extraction process.
We are using the first two dimensions of the embedding to visualize in the App and labels from pred_assign:
Here is a quick plot to do this:
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('default')
def plot_hdbscan_embedding_matplotlib(assign, embeds, behav = "test"):
unique_classes = np.unique(assign)
group_types = ['Group {}'.format(i) for i in unique_classes if i >= 0]
if -1 in unique_classes:
group_types = ["Noise"] + group_types
fig, ax = plt.subplots(figsize=(10, 10))
for num, g in enumerate(unique_classes):
idx = np.where(assign == g)[0]
ax.scatter(embeds[idx, 0],
embeds[idx, 1],
label=group_types[num],
s=3
)
ax.legend()
ax.set_title(f'{behav.capitalize()}')
ax.set_xlabel(f'UMAP (Dim. 1)')
ax.set_ylabel(f'UMAP (Dim. 2)')
ax.set_aspect('equal', 'datalim')
#remove ticks
ax.set_xticks([])
ax.set_yticks([])
#remove borders
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
return fig
Example Result:
Unfortunately I am not working with SPSS myself, so I am unsure if you can import these files directly. However, after you trained your active learning algorithm with the new clusters, you can use it to predict the clusters on your data. this will result in csv files that are in a standard format and split by input session.
Let me know if this helps!
When using the embedding_output.sav file to do exploratory analysis on clusters found from unsupervised learning, I tried to open the file via SPSS. How are others extracting information from this file? Is it possible to export this data to a csv format in the gui?