Closed matt-gardner closed 7 years ago
Oh one thing @matt-gardner - I'm not sure how file permissions work with the mounted drive, but I've found I have to have an entrypoint via a bash script which modifies the file permissions first - I'm guessing you didn't?
No, I didn't need to modify permissions; generally, I've tried to make everything group writable under /net/efs/aristo/dlfa/
, which should fix the permissions issues. I think there's some flag you can set to make it so that subdirectories inherit the group writable flag, but I haven't bothered with that yet...
You need to have bintray credentials set up, have docker working, have cloned
allenai/aristo
, be on the VPN, and whatnot, but if you've done all of these things, you can use./scripts/run_on_aws.sh [name] [param_file]
instead ofpython scripts/run_model.py [param_file]
, to run the training routine on a GPU machine in EC2, without locking up the machine you're working on. It's pretty amazing.Note that all paths in the parameter file, both for loading data and for saving models, need to be under
/net/efs/aristo/dlfa/
for this to work correctly. We're transitioning from the original S2 EFS drive (mounted as/efs
) to the EFS drive provided by techops (mounted as/net/efs
). This is whatbin/aristo
supports. I copied all of our data over from the S2 drive to the new place, so things should just work if you change the paths from/efs/data/dlfa/
to/net/efs/aristo/dlfa/
. Note that you'll probably also need to change the path to aristo in therun_on_aws.sh
script to where you have the aristo repo cloned.FYI @ColinArenz @pdasigi @nelson-liu @DeNeutoy @bsharpataz