diux-dev / cluster

train on AWS
75 stars 15 forks source link

'BatchTransformDataLoader' object has no attribute 'batch_sampler' #71

Closed yaroslavvb closed 6 years ago

yaroslavvb commented 6 years ago

One of my runs crashed with this error, have you seen something like this? @bearpelican

Batch size changed: 256
  File "train_imagenet_nv.py", line 460, in <module>
    main()
  File "train_imagenet_nv.py", line 254, in main
    dm.set_epoch(epoch)
  File "train_imagenet_nv.py", line 86, in set_epoch
    if cur_phase: self.set_data(cur_phase)
  File "train_imagenet_nv.py", line 98, in set_data
    self.trn_dl.batch_sampler.batch_size = phase['bs']
'BatchTransformDataLoader' object has no attribute 'batch_sampler'
bearpelican commented 6 years ago

@yaroslavvb Did an old file somehow get run? I had refactored the dataloader so that that line is no longer called https://github.com/diux-dev/cluster/commit/afb8bfbb9293f7d4d1093a5cbb56b9589dc089ee#diff-75be79d6640e3bb96c7683235830b933L307

yaroslavvb commented 6 years ago

aha training is baked into pytorch.v7 AMI, and my task.upload(training) fails silently when directory is present so we end up with old version

gonna fix the upload

bearpelican commented 6 years ago

I'll try to make sure to delete all the training stuff on the next AMI version

yaroslavvb commented 6 years ago

false alarm, upload works as expected, the issue was that my refactored version used script version from ~ instead of ~/training