Neuroglycerin / neukrill-net-tools

Tools coded as part of the NDSB competition.
MIT License
0 stars 0 forks source link

pylearn2 dataset iterator method #47

Closed gngdb closed 9 years ago

gngdb commented 9 years ago

Should create an iterator with standard Python iterator functionality, as described in the documentation.

gngdb commented 9 years ago

Finally realised that the Transformer dataset class in Pylearn2 is what we want to look at to build an infinite stream of augmented images style dataset. These seem to be the best, reported by the Galaxy challenge winner and Xudong Cao to improve performance quite a bit. Don't know how Pylearn2 is going to deal with an infinite stream of images, but hopefully it knows.

gngdb commented 9 years ago

This depends on having a stochastic wrapper for image augmentation; aka issue 114.

gngdb commented 9 years ago

Turns out this is a lot more difficult than I was expecting. Have had to go into the Pylearn2 code and hack some things to get it to run the first time. Now adapting their code so we can run things without having to do that. Unfortunately, so far seen no improvement on previous methods. Suspect it could be either a problem with normalisation or learning rate schedule due to changed epoch sizes (they are now 8 times smaller).

gngdb commented 9 years ago

Specifically, the changes I had to make are as follows:

diff --git a/pylearn2/datasets/transformer_dataset.py b/pylearn2/datasets/transformer_dataset.py                  
index b5bae8a..b76cbd1 100644                                                                                     
--- a/pylearn2/datasets/transformer_dataset.py                                                                    
+++ b/pylearn2/datasets/transformer_dataset.py                                                                    
@@ -202,7 +202,7 @@ class TransformerIterator(Iterator):                                                          
         self.raw_iterator = raw_iterator                                                                         
         self.transformer_dataset = transformer_dataset                                                           
         self.stochastic = raw_iterator.stochastic                                                                
-        self.uneven = raw_iterator.uneven                                                                        
+        #self.uneven = raw_iterator.uneven                                                                       
         self.data_specs = data_specs                                                                             

     def __iter__(self):                                                                                          
@@ -258,3 +258,7 @@ class TransformerIterator(Iterator):                                                          
             WRITEME                                                                                              
         """                                                                                                      
         return self.raw_iterator.num_examples                                                                    
+                                                                                                                 
+    @property                                                                                                    
+    def uneven(self):                                                                                            
+        return self.raw_iterator.uneven                                                                          
diff --git a/pylearn2/utils/iteration.py b/pylearn2/utils/iteration.py                                            
index 3cb49c3..de0e0e6 100644                                                                                     
--- a/pylearn2/utils/iteration.py                                                                                 
+++ b/pylearn2/utils/iteration.py                                                                                 
@@ -172,7 +172,8 @@ class SubsetIterator(object):                                                                 
             `True` if returned batches may be of differing sizes,                                                
             `False` otherwise.                                                                                   
         """                                                                                                      
-        raise NotImplementedError()                                                                              
+        #raise NotImplementedError()                                                                             
+        return False   
gngdb commented 9 years ago

Those hacks are no longer necessary with newest version of dense_dataset.py.

gngdb commented 9 years ago

We now have ways of applying augmentation to a dense dataset using a Transformer and Block, and also a dataset class in image_directory_dataset.py that acts enough like a Pylearn2 dataset to run. It can be passed an arbitrary stochastic processing function, which can then augment batches as required. However, it does make epochs run twice as slow, so it could use some speed fixes.