embedding_output.sav - Githubissues

Hi! Thanks for using A-SOiD. The file format is used to save some internal information and the gui has no feature to export directly. however, the results of the clustering can be exported in the directed discovery step.

If you want to open the sav file, you can use this code snippet to do it in python:

import joblib
path_to_sav = r"FULL/PATH/EMBEDDING.sav"
  with open(path_to_sav, 'rb') as fr:
      [umap_embeddings, assignments, soft_assignments, pred_assign] = joblib.load(fr)

Structure

Each parameter is a dictionary of the following structure:

target_behaviors = ["grooming", "sniffing", "turn", "locomotion"]

umap_embeddings = {key: [] for key in target_behaviors}
assignments = {key: [] for key in target_behaviors}
soft_assignments = {key: [] for key in target_behaviors}
pred_assign = {key: [] for key in target_behaviors}

so you can take the directed discovery results from each behavior seperate by using the target_behavior name as a key.

target_behavior = "grooming"
umap_embedd_groom = umap_embeddings[target_behavior]
pred_assign_groom = umap_embeddings[target_behavior]

The assignments are a label (0-n_clusters) per row. the embeddings are the multidimensional embedding based on the features. Note that your entire data is concatenated in there, so differentiating between input sessions is not possible without backtracing the feauture extraction process.

Visualization:

We are using the first two dimensions of the embedding to visualize in the App and labels from pred_assign:

Here is a quick plot to do this:

import matplotlib.pyplot as plt
import numpy as np

plt.style.use('default')
def plot_hdbscan_embedding_matplotlib(assign, embeds, behav = "test"):

        unique_classes = np.unique(assign)
        group_types = ['Group {}'.format(i) for i in unique_classes if i >= 0]
        if -1 in unique_classes:
            group_types = ["Noise"] + group_types

        fig, ax = plt.subplots(figsize=(10, 10))
        for num, g in enumerate(unique_classes):
            idx = np.where(assign == g)[0]
            ax.scatter(embeds[idx, 0],
                       embeds[idx, 1],
                       label=group_types[num],
                       s=3
                       )
        ax.legend()
        ax.set_title(f'{behav.capitalize()}')
        ax.set_xlabel(f'UMAP (Dim. 1)')
        ax.set_ylabel(f'UMAP (Dim. 2)')
        ax.set_aspect('equal', 'datalim')
        #remove ticks
        ax.set_xticks([])
        ax.set_yticks([])
        #remove borders
        ax.spines['top'].set_visible(False)
        ax.spines['right'].set_visible(False)
        plt.show()
        return fig

Example Result:

grafik

SPSS

Unfortunately I am not working with SPSS myself, so I am unsure if you can import these files directly. However, after you trained your active learning algorithm with the new clusters, you can use it to predict the clusters on your data. this will result in csv files that are in a standard format and split by input session.

Let me know if this helps!

YttriLab / A-SOID

embedding_output.sav #63

Structure

Visualization:

SPSS