RasmussenLab / MOVE

MOVE (Multi-Omics Variational autoEncoder) for integrating multi-omics data and identifying cross modal associations
https://move-dl.readthedocs.io/
MIT License
64 stars 25 forks source link

Error during __tune_reconstruction: score in calculate_accuracy (metrics.py) cannot be calculated #74

Closed t-soehngen closed 1 year ago

t-soehngen commented 1 year ago

I'm currently training MOVE on proteomics data in combination with lots of categorical data (with a few missing values). My input data is structured as instructed (1 Feature/File, missing values = NA).

When MOVE tries to calculate the score during reconstruction tuning, it struggles with the missing values since num_features has the original length (including masked entries) but y_true and y_pred have lengths n - n_masked. Excluding all categorical features containing missing values results in a successful run. What is the correct way to fix that error? analysis\metrics.py

The Error thrown is below:

Error executing job with overrides: ['task.batch_size=10', 'task.model.num_hidden=[500]', 'task.training_loop.num_epochs=40', 'experiment=mpn__tune_reconstruction']
Traceback (most recent call last):
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\__main__.py", line 38, in main
    move.tasks.tune_model(config)
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\tasks\tune_model.py", line 249, in tune_model
    _tune_reconstruction(task_config)
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\tasks\tune_model.py", line 216, in _tune_reconstruction
    accuracy = calculate_accuracy(cat[mask], cat_recon)
  File "C:\Users\t159g\.conda\envs\moveEnv\lib\site-packages\move\analysis\metrics.py", line 36, in calculate_accuracy
    scores = np.ma.compressed(np.sum(y_true == y_pred, axis=1)) / num_features
ValueError: operands could not be broadcast together with shapes (118,) (131,)
t-soehngen commented 1 year ago

Replacing 2 values in the maize dataset (e.g. maize_fields.tsv) with NA reproduces the error

ri-heme commented 1 year ago

Hi, thanks for reporting the bug and also adding some details on how to reproduce it! That was super helpful.

Seems like we didn't expect one feature datasets to have NaN values, but I have fixed it so now it shouldn't be throwing any errors. If you come across another exception or have any other question, please feel free to open another issue. 🙂

ri-heme commented 1 year ago

The new version of MOVE (1.4.6) is up. ⬆️