Open Rob174 opened 3 years ago
#!/usr/bin/env bash
source /usr/local/insa/anaconda/bin/activate IA-GPU
python3 "$@"
Note : this script allows to automatically activate the conda environment where is located tensorflow-gpu
Installing locally a library allow to install python packages without sudo : it install the packages in your home directory (in an hidden folder)
source /usr/local/insa/anaconda/bin/activate IA-GPU
# For several ais
pip3 install opencv-python pillow argparse cairosvg graphviz --user
# For ENET
ENET_FOLDER= ................/enet/ # to keep the same imports
git clone https://github.com/Rob174/enet-keras.git $ENET_FOLDER
cd $ENET_FOLDER
git checkout adaptation_tensorflow
python -m pip install -e $ENET_FOLDER # Install a python repo with sources present locally as a package
Very important : it is possible to open the tensorboard on the GEI server with firefox and the -X option of the ssh command but it is very slow.
Only the tensorboard is required on the personnal machine, not the entire tensorflow installation
Command to install the tensorboard on the local machine
pip3 install tensorboard
Note : the python interpreter and pip3 or pip (in this case change the former command) has to be recognized on the local console (cmd, bash ....)
After that you will be able to see the curves (after some trainings) by launching the command and following the instructions printed
tensorboard --logdir directory/of/folders/with/tfevents
But don't forget to synchronize your local data output folder
If data have changed and the tensorboard was already launched, you neeed at least to refresh the localhost webpage or to kill and relaunch the tensorboard with the former command
Sometimes the synchronization is automatic sometimes it is not. Generally code changes upload works (by saving the file) (see File transfer window to check) but as there is no software to detect changes on the server, we have to manually download output data
To force the synchronization we have to use the following menu :
This method allow to disconnect from the terminal and vpn without killing the training process
nohup bash -c "python3 monscript.py -arg1=... .................... &> mylogfile" &bg;disown
Advice :
Make a file to_run
for example with all commands launched or to be launched with their status to keep track of the trainings launched.
Wait a couple of minutes and view the mylogfile file (with vim for instance (you can directly type 50% for instance to go to the middle of the file))
Search for this type of line : dot -Tsvg /home/...../..../data/enet/2021-05-14_13h32min54s_/2021-05-14_13h32min54s_model.dot -o /home/...../..../data/enet/2021-05-14_13h32min54s_/2021-05-14_13h32min54s_model.svg
And extract the id of the training 2021-05-14_13h32min54s. Write it down in the to_run file.
It will allow you to quickly get the training desired by looking in your to_run file and copying the id in the tensorboard search bar
Summary