Rob174 / PIR

1 stars 0 forks source link

Environment setup #26

Open Rob174 opened 3 years ago

Rob174 commented 3 years ago

Summary

  1. General organization
  2. Setup remote python environment
  3. Setup remote ssh
  4. Install local python libraries required to run the code on the GEI server
  5. Synchronize a local folder with the remote folder
  6. Launching long druation training script
Rob174 commented 3 years ago

Environment setup recommanded

Required

General organization of the environment

Asset 60@4x

Rob174 commented 3 years ago

Setup remote python environment

Conda activate environment script

#!/usr/bin/env bash
source /usr/local/insa/anaconda/bin/activate IA-GPU
python3 "$@"

Note : this script allows to automatically activate the conda environment where is located tensorflow-gpu

Rob174 commented 3 years ago

Setup remote ssh

Asset 59@4x

Rob174 commented 3 years ago

Install local python libraries required to run the code on the GEI server

Installing locally a library allow to install python packages without sudo : it install the packages in your home directory (in an hidden folder)


source /usr/local/insa/anaconda/bin/activate IA-GPU

# For several ais

pip3 install opencv-python pillow argparse cairosvg graphviz --user

# For ENET

ENET_FOLDER= ................/enet/ # to keep the same imports
git clone https://github.com/Rob174/enet-keras.git  $ENET_FOLDER
cd $ENET_FOLDER
git checkout adaptation_tensorflow
python -m pip install -e $ENET_FOLDER # Install a python repo with sources present locally as a package 
Rob174 commented 3 years ago

Install the tensorboard in your local machine

Very important : it is possible to open the tensorboard on the GEI server with firefox and the -X option of the ssh command but it is very slow.

Only the tensorboard is required on the personnal machine, not the entire tensorflow installation

Command to install the tensorboard on the local machine

pip3 install tensorboard

Note : the python interpreter and pip3 or pip (in this case change the former command) has to be recognized on the local console (cmd, bash ....)

After that you will be able to see the curves (after some trainings) by launching the command and following the instructions printed

tensorboard --logdir directory/of/folders/with/tfevents

But don't forget to synchronize your local data output folder

If data have changed and the tensorboard was already launched, you neeed at least to refresh the localhost webpage or to kill and relaunch the tensorboard with the former command

Rob174 commented 3 years ago

Synchronize a local folder with the remote folder

Sometimes the synchronization is automatic sometimes it is not. Generally code changes upload works (by saving the file) (see File transfer window to check) but as there is no software to detect changes on the server, we have to manually download output data

To force the synchronization we have to use the following menu :

Rob174 commented 3 years ago

Launching long duration training script

This method allow to disconnect from the terminal and vpn without killing the training process

nohup bash -c "python3 monscript.py -arg1=... .................... &> mylogfile" &bg;disown

Advice :

Make a file to_run for example with all commands launched or to be launched with their status to keep track of the trainings launched. Wait a couple of minutes and view the mylogfile file (with vim for instance (you can directly type 50% for instance to go to the middle of the file)) Search for this type of line : dot -Tsvg /home/...../..../data/enet/2021-05-14_13h32min54s_/2021-05-14_13h32min54s_model.dot -o /home/...../..../data/enet/2021-05-14_13h32min54s_/2021-05-14_13h32min54s_model.svg And extract the id of the training 2021-05-14_13h32min54s. Write it down in the to_run file.

It will allow you to quickly get the training desired by looking in your to_run file and copying the id in the tensorboard search bar