Basic network training with batches

Blitzman commented 6 years ago

[x] Implement basic GCN network with pooling and batch training.
[x] Test and verify that the training routine works even if accuracy is bad.
[x] Implement basic train/test splitting.
[x] Implement basic train/test accuracy evaluation.
[x] Implement visualization routines for evaluation (confusion matrix, learning plots).
[x] Document this first iteration results.

Blitzman commented 5 years ago

Implemented a basic GCN with Graclus pooling network with batch training in 5c840e3.

For the moment, training does not appear to happen. Everything works nicely but training accuracy is always stuck at the same value. Tried changing learning rate and nothing happened. Tried increasing batch size but non-deterministic results happen: the output size after pooling looks randomized! It should always be [BATCH, 2] to reflect the number of graphs and their pre-softmax logits for both classes. However, it appears to produce random values on the first dimension...

SOLVED: The errors and training stall was caused by a bug introduced somewhere in the NVIDIA/CUDA/PyTorch stack. Presumably it was related to CUDA scatter. Updating the driver to NV410, CUDA 9.2, reinstalling PyTorch and all the PyTorch Geometric packages recompiling with the new CUDA driver and version worked.

Blitzman commented 5 years ago

The dataset combines both palmside and palmdown grasps and separates it randomly into two balanced and disjoint sets (training and test) with 20% samples for testing and 80% samples for training.

Each sample or graph is composed by 24 nodes (one for each taxel) with 3 features each (each one of the sensor readings for index, middle, and thumb). Taxel positions are manually specified as well as edges (connections) as follows:

self.m_taxels_x = [-3.0, -2.0, -4.0, -2.5, -1.5, -4.0, -2.5, -0.5, -2.0, -2.5, 3.0, 2.0, 4.0, 2.5, 1.5, 4.0, 2.5, 0.5, 2.0, 2.5, 0.0, -1.0, 1.0, 0.0]
self.m_taxels_y = [5.0, 4.0, 1.0, 0.0, -1.0, -2.0, -3.0, -4.0, -5.0, -6.0, 5.0, 4.0, 1.0, 0.0, -1.0, -2.0, -3.0, -4.0, -5.0, -6.0, 3.0, 2.0, 2.0, 0.0]
self.m_edge_origins = [0, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 9, 10, 11, 11, 12, 13, 13, 13, 14, 14, 14, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 19, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23]
self.m_edge_ends = [1, 0, 20, 3, 2, 4, 21, 23, 3, 6, 7, 23, 6, 5, 4, 7, 8, 4, 6, 8, 17, 6, 7, 9, 8, 11, 10, 20, 13, 12, 23, 14, 13, 16, 17, 16, 15, 14, 17, 18, 14, 16, 18, 7, 17, 16, 19, 18, 1, 11, 21, 22, 3, 20, 22, 23, 13, 20, 21, 23, 21, 22, 3, 13, 4, 14]

This first iteration has the following network:

INFO:__main__:Net(
  (conv1): GCNConv(3, 32)
  (conv2): GCNConv(32, 64)
  (fc1): Linear(in_features=64, out_features=2, bias=True)
)

Global mean pooling with graclus is applied after the second convolutional layer to reduce each graph to a single node (presumably) before getting that into the fully connected layer. As a result, a [B, 2] tensor is produced, being B the batch size.

ADAM was used as solver with the following parameters:

INFO:__main__:Adam (
Parameter Group 0
    amsgrad: False
    betas: (0.9, 0.999)
    eps: 1e-08
    lr: 0.0001
    weight_decay: 0.0005
)

Training achieved a top accuracy of around 80% on the test set and training it for 32 epochs with a batch size of 1 took nearly 782 seconds.

figure_1

Blitzman commented 5 years ago

Log has been uploaded to logs/baseline.log. Implementation used was the one from commit https://github.com/3dperceptionlab/tactile-gcn/commit/fcdf490c873e55f7cdb00c56e8cacf5dacf65932.

3dperceptionlab / tactile-gcn

Basic network training with batches #4