About failed to read the hdf5 file

CHENGHUAN555 commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues

Bug description

When running the following code, an error was reported in line 30:

,"continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])" ,mainly because an error occurred when cebra.load_data() was used as the.h5 file. I do not know how to solve it, and I hope to seek the author's help and solution.
---------------------------------------------------------------------------------------------------------------------------------------------
test.py
---------------------------------------------------------------------------------------------------------------------------------------------
# Create a .h5 file, containing a pd.DataFrame
import pandas as pd
import numpy as np
X_continuous = np.random.normal(0,1,(100,3))
X_discrete = np.random.randint(0,10,(100, ))
df = pd.DataFrame(np.array(X_continuous), columns=["continuous1", "continuous2", "continuous3"])
df["discrete"] = X_discrete
df.to_hdf("auxiliary_behavior_data.h5", key="auxiliary_variables")

import cebra
from numpy.random import uniform, randint
from sklearn.model_selection import train_test_split

# 1. Define a CEBRA model
cebra_model = cebra.CEBRA(
    model_architecture = "offset10-model",
    batch_size = 512,
    learning_rate = 1e-4,
    max_iterations = 10, # TODO(user): to change to at least 10'000
    max_adapt_iterations = 10, # TODO(user): to change to ~100-500
    time_offsets = 10,
    output_dimension = 8,
    verbose = False
)

# 2. Load example data
neural_data = cebra.load_data(file="neural_data.npz", key="neural")
new_neural_data = cebra.load_data(file="neural_data.npz", key="new_neural")
continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])
discrete_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["discrete"]).flatten()

assert neural_data.shape == (100, 3)
assert new_neural_data.shape == (100, 4)
assert discrete_label.shape == (100, )
assert continuous_label.shape == (100, 3)

# 3. Split data and labels
(
    train_data,
    valid_data,
    train_discrete_label,
    valid_discrete_label,
    train_continuous_label,
    valid_continuous_label,
) = train_test_split(neural_data,
                    discrete_label,
                    continuous_label,
                    test_size=0.3)

# 4. Fit the model
# time contrastive learning
cebra_model.fit(train_data)
# discrete behavior contrastive learning
cebra_model.fit(train_data, train_discrete_label,)
# continuous behavior contrastive learning
cebra_model.fit(train_data, train_continuous_label)
# mixed behavior contrastive learning
cebra_model.fit(train_data, train_discrete_label, train_continuous_label)

# 5. Save the model
cebra_model.save('/tmp/foo.pt')

# 6. Load the model and compute an embedding
cebra_model = cebra.CEBRA.load('/tmp/foo.pt')
train_embedding = cebra_model.transform(train_data)
valid_embedding = cebra_model.transform(valid_data)
assert train_embedding.shape == (70, 8)
assert valid_embedding.shape == (30, 8)

# 7. Evaluate the model performances
goodness_of_fit = cebra.sklearn.metrics.infonce_loss(cebra_model,
                                                     valid_data,
                                                     valid_discrete_label,
                                                     valid_continuous_label,
                                                     num_batches=5)

# 8. Adapt the model to a new session
cebra_model.fit(new_neural_data, adapt = True)

# 9. Decode discrete labels behavior from the embedding
decoder = cebra.KNNDecoder()
decoder.fit(train_embedding, train_discrete_label)
prediction = decoder.predict(valid_embedding)
assert prediction.shape == (30,)

Operating System

windows 10

CEBRA version

cebra version 0.2.0

Device type

gpu

Steps To Reproduce

No response

Relevant log output

Traceback (most recent call last):
  File "E:\crop\injuryrun4\test.py", line 30, in <module>
    continuous_label = cebra.load_data(file="auxiliary_behavior_data.h5", key="auxiliary_variables", columns=["continuous1", "continuous2", "continuous3"])
  File "E:\anaconda\envs\injuryrun4test\lib\site-packages\cebra\data\load.py", line 661, in load
    data = loader.load(file, key=key, columns=columns)
  File "E:\anaconda\envs\injuryrun4test\lib\site-packages\cebra\data\load.py", line 211, in load
    raise ModuleNotFoundError()
ModuleNotFoundError

Anything else?

No response

Code of Conduct

[X] I agree to follow this project's Code of Conduct

MMathisLab commented 1 year ago

@gonlairo can you take a look?

MMathisLab commented 1 year ago

@CHENGHUAN555 did you install the data pip install cebra[datasets] otherwise indeed the module is not loaded. I suggest checking out demos here: https://cebra.ai/docs/demos.html, which use a particular data loader, but nonetheless you get the idea. See install here: https://cebra.ai/docs/installation.html#id1

EricThomson commented 1 year ago

This solved it for me when I hit this error when working through the code in the Usage page.

A couple of notes (feel free to ignore :smile:) -- I found the ModuleNotFound message a bit hard to interpret as it didn't say what module, so I wasn't sure how to proceed. Also, the installation page says that the datasets optional dependency is for working with the datasets at Figshare. Hence, when I got the error on the Usage page when trying to do stuff with synthetic data, I didn't consider the correct solution.

Anyway, minor wrinkles -- congrats on the cool package I'm having fun with it so far!

stes commented 1 year ago

@EricThomson , thanks for flagging. I created a new issue to track these potential improvements here: https://github.com/AdaptiveMotorControlLab/CEBRA/issues/77

AdaptiveMotorControlLab / CEBRA