diux-dev / cluster

train on AWS
75 stars 15 forks source link

[WIP] Refactor imagenet #69

Closed bearpelican closed 6 years ago

bearpelican commented 6 years ago

@yaroslavvb initial refactor

Created a logger and meter class. I wanted to keep as much of the metric calculation and logging out of the training script. This way it's a bit easier to understand the training tricks we are using.

Still a bit more to cleanup. Feel free to add any suggestions

bearpelican commented 6 years ago

@yaroslavvb I put all the server files into the directory 'training'. https://github.com/diux-dev/cluster/pull/69/commits/e9162693936b5574d487aed71a753850436e2872#diff-2e1c1e714faa017fce6b7925c724322eR453

We are uploading the whole directory now and retaining the structure - so it should work nicely with sync.py It was a bit easier to organize this way, but I can revert this commit if it's a problem!

yaroslavvb commented 6 years ago

Looks good to me! Yes, sync.py syncs the whole directory

yaroslavvb commented 6 years ago

Does it work?

bearpelican commented 6 years ago

On single p3. I'll try 4 in Virginia right now