issues
search
diux-dev
/
cluster
train on AWS
75
stars
15
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
imagenet-sz.tar data not accessible
#75
atunick
opened
4 years ago
0
add documentation for file structure
#74
dreamflasher
closed
5 years ago
5
Dataset URL Access Denied
#73
ymjiang
closed
5 years ago
4
Syntax errors and undefined names
#72
cclauss
closed
6 years ago
1
'BatchTransformDataLoader' object has no attribute 'batch_sampler'
#71
yaroslavvb
closed
6 years ago
4
[TESTING] C10d
#70
bearpelican
opened
6 years ago
0
[WIP] Refactor imagenet
#69
bearpelican
closed
6 years ago
4
make most recent AMI available in regions
#68
yaroslavvb
opened
6 years ago
5
make imagenet18 snapshots available
#67
yaroslavvb
closed
6 years ago
9
modify /etc/hosts to resolve job public IP's locally
#66
yaroslavvb
opened
6 years ago
2
nexus -> ncluster rename
#65
yaroslavvb
closed
6 years ago
8
nexus: give spot requests names and treat them same as stopped instances
#64
yaroslavvb
opened
6 years ago
1
ncluster examples
#63
yaroslavvb
closed
6 years ago
1
Turn cluster into standalone ncluster package
#62
yaroslavvb
closed
6 years ago
1
nexus: add debug mode with logs showing file:line number
#61
yaroslavvb
closed
6 years ago
1
nexus: create tags at the moment of instance creation
#60
yaroslavvb
opened
6 years ago
0
nexus: use mosh for tmux connections?
#59
yaroslavvb
opened
6 years ago
1
nexus: make file upload more user-friendly
#58
yaroslavvb
opened
6 years ago
1
nexus: remove boto3 profile, use env vars instead
#57
yaroslavvb
closed
6 years ago
1
nexus: better support for single instance runs
#56
yaroslavvb
closed
6 years ago
2
ImageNet: release configurations
#55
yaroslavvb
opened
6 years ago
3
ImageNet: scale_lr is confusing
#54
yaroslavvb
closed
6 years ago
2
ImageNet: GPU_0... etc logs don't seem to be saved properly
#53
yaroslavvb
opened
6 years ago
1
util.get_name should crash when client/global env regions are different
#52
yaroslavvb
opened
6 years ago
0
ImageNet: batch-size doesn't get logged after refactor
#51
yaroslavvb
closed
6 years ago
2
reducing OOM on large batch sizes
#50
yaroslavvb
closed
6 years ago
7
Scheduler refactor
#49
bearpelican
closed
6 years ago
1
fields like logdir should be accessible on run/job or task
#48
yaroslavvb
opened
6 years ago
0
pytorch cifar example doesn't quit gracefully
#47
yaroslavvb
opened
6 years ago
0
ImageNet: step latency spikes
#46
yaroslavvb
opened
6 years ago
0
ImageNet: number of epochs and lr schedule shouldn't interact
#45
yaroslavvb
closed
6 years ago
1
add validation to userdata
#44
yaroslavvb
closed
6 years ago
1
ImageNet: spikes in training loss once per epoch
#43
yaroslavvb
closed
6 years ago
1
ImageNet: spikes in data_time for 16-machine version
#42
yaroslavvb
opened
6 years ago
0
ImageNet: changing batch-size affects point when lr switches
#41
yaroslavvb
closed
6 years ago
1
move ~/data to /data
#40
yaroslavvb
opened
6 years ago
0
rename availability_zone to zone
#39
yaroslavvb
closed
6 years ago
1
async_join should crash if any of worker threads crash
#38
yaroslavvb
closed
6 years ago
0
change step to scale with images rather than step size
#37
yaroslavvb
closed
6 years ago
1
track down why crash in "source activate" doesn't crash launcher
#36
yaroslavvb
closed
6 years ago
1
improve EFS copying speed
#35
yaroslavvb
closed
6 years ago
0
show "Terminating" instances under aws_tool.py
#34
yaroslavvb
closed
6 years ago
0
use incremental numbering for runs instead of datetime
#33
yaroslavvb
closed
6 years ago
0
add ability to run command remotely and get it's output
#32
yaroslavvb
closed
6 years ago
0
increase tmux history window
#31
yaroslavvb
closed
6 years ago
0
ImageNet logging checklist
#30
yaroslavvb
opened
6 years ago
0
do logging onto EFS instead of EBS
#29
yaroslavvb
closed
6 years ago
1
number of machines + learning rate should be specified in the same place
#28
yaroslavvb
closed
6 years ago
0
Change Spot Interruption strategy to Stop
#27
yaroslavvb
opened
6 years ago
1
cost optimization: tool to move replicate EBS volumes to different AZ
#26
yaroslavvb
closed
6 years ago
1
Next