beringresearch / ivis

Dimensionality reduction in very large datasets using Siamese Networks
https://beringresearch.github.io/ivis/
Apache License 2.0
330 stars 43 forks source link

Reproducibility #89

Closed Professoroo closed 3 years ago

Professoroo commented 3 years ago

Hello,

How can we get reproducible results regarding the seed? Is there an argument regarding the initial_state for example that we can pass?

Thanks, Regards

Szubie commented 3 years ago

Hi,

Please see this issue for an example of reproducible runs: https://github.com/beringresearch/ivis/issues/85

Here's a snippet from that issue that shows an example of of this in action:

import os
os.environ["PYTHONHASHSEED"]="0"

import random
import numpy as np

import numpy as np
import tensorflow as tf
import random

# The below is necessary for starting Numpy generated random numbers
# in a well-defined initial state.
np.random.seed(123)

# The below is necessary for starting core Python generated random numbers
# in a well-defined state.
random.seed(123)

# The below set_seed() will make random number generation
# in the TensorFlow backend have a well-defined initial state.
# For further details, see:
# https://www.tensorflow.org/api_docs/python/tf/random/set_seed
tf.random.set_seed(1234)

import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import NearestNeighbors

from ivis import Ivis

iris = load_iris()
data = iris.data
target = iris.target

X = MinMaxScaler().fit_transform(data)

# Here we're creating a fixed NN matrix. For large out-of-memroy datasets, you can achieve the same
# with Ivis' Annoy functionality (https://bering-ivis.readthedocs.io/en/latest/api.html#neighbour-retrieval),
# i.e. build the index separately and then pass it into the Ivis constructor.
nbrs = NearestNeighbors(n_neighbors=5).fit(X)
distances, indices = nbrs.kneighbors(X)

model = Ivis(embedding_dims=2, k=5, batch_size=X.shape[0],
             neighbour_matrix=indices,
             n_epochs_without_progress=5, verbose=0)

model.fit(X)

embeddings = model.transform(X)

plt.scatter(embeddings[:, 0], embeddings[:, 1], c=target)
Professoroo commented 3 years ago

Thank you a lot! That's solved the problem. I would like to mention another error that is solved with the above-mentioned code as well.

Error type: "OSError: Unable to open: Invalid argument (22)"

Solved after using neighbour_matrix from NearestNeighbors function as the following:

nbrs = NearestNeighbors(n_neighbors=5).fit(X) distances, indices = nbrs.kneighbors(X)

model = Ivis(embedding_dims=2, k=5, batch_size=X.shape[0], neighbour_matrix=indices, n_epochs_without_progress=5, verbose=0)

Thank you so much!!!