DIVA-DIA / DeepDIVA

⛔️ DEPRECATED <Python Framework for Reproducible Deep Learning Experiments>
https://diva-dia.github.io/DeepDIVAweb
GNU Lesser General Public License v3.0
32 stars 31 forks source link

Issues with torch 1.1 #6

Open ashlaban opened 5 years ago

ashlaban commented 5 years ago

Setting up diva on a new machine (without conda, but the problem should apply in the conda case as well since the torch version is not pinned and the latest conda version is torch@1.1) we ran into some issues:

1) In util/data/get_a_dataset.py the dataset properties changed from

    train_data, train_labels = cifar_train.train_data, cifar_train.train_labels
    test_data, test_labels = cifar_test.test_data, cifar_test.test_labels

to

    train_data, train_labels = cifar_train.data, cifar_train.targets
    test_data, test_labels = cifar_test.data, cifar_test.targets

. This should apply to all datasets loaded through the same mechanism.

2) It seems torch no longer supports accessing 0-dim elements through array indexing. As such this change was required: loss.data[0] -> loss.data.item() in template/runner/image_classification/evaluate.py and template/runner/image_classification/train.py. Should be applicable also to apply_model etc.

3) Something is fishy with the arrays returned from _load_mean_std_from_file, the transforms.Normalize cannot convert the input mean and std to float tensors, instead the input is considered of dtype=np.object. Check setup_dataloader in template/setup.py.

If I find the time I will also provide a proper PR for these problems. For now, please see attached patch.

PATCH

diff --git a/template/runner/image_classification/evaluate.py b/template/runner/image_classification/evaluate.py
index 9205d73..e22f594 100644
--- a/template/runner/image_classification/evaluate.py
+++ b/template/runner/image_classification/evaluate.py
@@ -91,7 +91,7 @@ def _evaluate(data_loader, model, criterion, writer, epoch, logging_label, no_cu

         # Compute and record the loss
         loss = criterion(output, target_var)
-        losses.update(loss.data[0], input.size(0))
+        losses.update(loss.data.item(), input.size(0))

         # Compute and record the accuracy
         acc1 = accuracy(output.data, target, topk=(1,))[0]
@@ -103,10 +103,10 @@ def _evaluate(data_loader, model, criterion, writer, epoch, logging_label, no_cu

         # Add loss and accuracy to Tensorboard
         if multi_run is None:
-            writer.add_scalar(logging_label + '/mb_loss', loss.data[0], epoch * len(data_loader) + batch_idx)
+            writer.add_scalar(logging_label + '/mb_loss', loss.data.item(), epoch * len(data_loader) + batch_idx)
             writer.add_scalar(logging_label + '/mb_accuracy', acc1.cpu().numpy(), epoch * len(data_loader) + batch_idx)
         else:
-            writer.add_scalar(logging_label + '/mb_loss_{}'.format(multi_run), loss.data[0],
+            writer.add_scalar(logging_label + '/mb_loss_{}'.format(multi_run), loss.data.item(),
                               epoch * len(data_loader) + batch_idx)
             writer.add_scalar(logging_label + '/mb_accuracy_{}'.format(multi_run), acc1.cpu().numpy(),
                               epoch * len(data_loader) + batch_idx)
diff --git a/template/runner/image_classification/train.py b/template/runner/image_classification/train.py
index 9cf2324..497eee5 100644
--- a/template/runner/image_classification/train.py
+++ b/template/runner/image_classification/train.py
@@ -72,10 +72,10 @@ def train(train_loader, model, criterion, optimizer, writer, epoch, no_cuda=Fals

         # Add loss and accuracy to Tensorboard
         if multi_run is None:
-            writer.add_scalar('train/mb_loss', loss.data[0], epoch * len(train_loader) + batch_idx)
+            writer.add_scalar('train/mb_loss', loss.data.item(), epoch * len(train_loader) + batch_idx)
             writer.add_scalar('train/mb_accuracy', acc.cpu().numpy(), epoch * len(train_loader) + batch_idx)
         else:
-            writer.add_scalar('train/mb_loss_{}'.format(multi_run), loss.data[0],
+            writer.add_scalar('train/mb_loss_{}'.format(multi_run), loss.data.item(),
                               epoch * len(train_loader) + batch_idx)
             writer.add_scalar('train/mb_accuracy_{}'.format(multi_run), acc.cpu().numpy(),
                               epoch * len(train_loader) + batch_idx)
@@ -141,7 +141,7 @@ def train_one_mini_batch(model, criterion, optimizer, input_var, target_var, los

     # Compute and record the loss
     loss = criterion(output, target_var)
-    loss_meter.update(loss.data[0], len(input_var))
+    loss_meter.update(loss.data.item(), len(input_var))

     # Compute and record the accuracy
     acc = accuracy(output.data, target_var.data, topk=(1,))[0]
diff --git a/template/setup.py b/template/setup.py
index 2a87ccd..77c68a4 100644
--- a/template/setup.py
+++ b/template/setup.py
@@ -275,6 +275,8 @@ def set_up_dataloaders(model_expected_input_size, dataset_folder, batch_size, wo

         # Loads the analytics csv and extract mean and std
         mean, std = _load_mean_std_from_file(dataset_folder, inmem, workers)
+        mean = np.asarray([x for x in mean], dtype=np.float32)
+        std = np.asarray([x for x in std], dtype=np.float32)

         # Set up dataset transforms
         logging.debug('Setting up dataset transforms')
diff --git a/util/data/get_a_dataset.py b/util/data/get_a_dataset.py
index 5b78ef8..ea36794 100644
--- a/util/data/get_a_dataset.py
+++ b/util/data/get_a_dataset.py
@@ -150,9 +151,9 @@ def cifar10(args):
     cifar_test = torchvision.datasets.CIFAR10(root=args.output_folder, train=False, download=True)

     # Load the data into memory
-    train_data, train_labels = cifar_train.train_data, cifar_train.train_labels
+    train_data, train_labels = cifar_train.data, cifar_train.targets

-    test_data, test_labels = cifar_test.test_data, cifar_test.test_labels
+    test_data, test_labels = cifar_test.data, cifar_test.targets

     # Make output folders
     dataset_root = os.path.join(args.output_folder, 'CIFAR10')
ashlaban commented 5 years ago

I almost forgot, there was an issue with the tensorboard output as well, so we had to disable that.

vinaychandranp commented 5 years ago

Thanks for bringing up the issues (and the patch)! We're working on moving the entire codebase up to PyTorch-1.1 and Python-3.7 currently and it should be out soon.

ashlaban commented 5 years ago

Great, looking forward to it then! Feel free to close the issue when the new version is out ;)

MuntahaFHSTP commented 4 years ago

s for bringing up the issues (and the patch)! We're working on moving the entire codebase up to PyTorch-1.1 and Python-3.7 currently and it should be out soon.

is this available now?