Closed JonathanFL closed 1 year ago
Hi there,
I am not familiar with Windows, but Cannot create a file when that file already exists: './exp/test-esc50-f10-t10-impTrue-aspTrue-b48-lr1e-5/fold1/models'
means you need to either remove this old experiment directory, or change the experiment path specified in https://github.com/YuanGongND/ast/blob/9e3bd9942210680b833b08c39d09f2284ddc4d1d/egs/esc50/run_esc.sh#L48
The reason that I don't place an checking before creating the folder is because I intend to raise an error to avoid new run overwriting the old one.
-Yuan
Hi Yuan,
I don't know why, but first, I had to create the 'exp' folder from the shell script and, afterwards, the 'models' folder. I guess the behaviour of makedirs is different between Windows and your OS.
models_dir = os.path.join(args.exp_dir,"models")
print("Creating models directory: %s" % models_dir)
if not os.path.exists(models_dir):
os.mkdir(models_dir)
This still works, because of this part in the shell script
if [ -d $base_exp_dir ]; then
echo 'exp exist'
exit
fi
But I guess it would not work if not run from the shell script. Anyway, just letting you or anybody else know. Also, I had to modify the shell script for Windows paths, e.g., the python venv path.
On another note, I was unable to train, even with num_worker=0 and batch_size=2, because of the same error as this one.
Could an alternative be to set it up in Google Colab for training with own data? The modified shell script looks like this now:
#!/bin/bash
#SBATCH -p gpu
#SBATCH -x sls-titan-[0-2]
#SBATCH --gres=gpu:4
#SBATCH -c 4
#SBATCH -n 1
#SBATCH --mem=48000
#SBATCH --job-name="ast-esc50"
#SBATCH --output=./log_%j.txt
set -x
# comment this line if not running on sls cluster
#. /data/sls/scratch/share-201907/slstoolchainrc
source ../../venvast/Scripts/activate
export TORCH_HOME=../../pretrained_models
model=ast
dataset=esc50
imagenetpretrain=True
audiosetpretrain=True
bal=none
if [ $audiosetpretrain == True ]
then
lr=1e-5
else
lr=1e-4
fi
freqm=24
timem=96
mixup=0
epoch=25
batch_size=2
fstride=16
tstride=16
base_exp_dir=.\\exp\\test-${dataset}-f$fstride-t$tstride-imp$imagenetpretrain-asp$audiosetpretrain-b$batch_size-lr${lr}
python .\\prep_esc50.py
if [ -d $base_exp_dir ]; then
echo 'exp exist'
exit
fi
mkdir -p $base_exp_dir
for((fold=1;fold<=5;fold++));
do
echo 'now process fold'${fold}
exp_dir=${base_exp_dir}\\fold${fold}
echo 'creating exp dir: '${exp_dir}
mkdir -p $exp_dir
tr_data=.\\data/datafiles\\esc_train_data_${fold}.json
te_data=.\\data/datafiles\\esc_eval_data_${fold}.json
CUDA_CACHE_DISABLE=1 python -W ignore ..\\..\\src\\run.py --model ${model} --dataset ${dataset} \
--data-train ${tr_data} --data-val ${te_data} --exp-dir $exp_dir \
--label-csv ./data/esc_class_labels_indices.csv --n_class 50 \
--lr $lr --n-epochs ${epoch} --batch-size $batch_size --save_model False \
--freqm $freqm --timem $timem --mixup ${mixup} --bal ${bal} \
--tstride $tstride --fstride $fstride --imagenet_pretrain $imagenetpretrain --audioset_pretrain $audiosetpretrain
done
python .\\get_esc_result.py --exp_path ${base_exp_dir}
This "works", but again, I might have too few GPU resources on my laptop(?)
On another note, I was unable to train, even with num_worker=0 and batch_size=2, because of the same error as https://github.com/YuanGongND/ast/issues/54#issuecomment-1073397042.
The model itself takes some GPU memory so it is possible that the code is not runable with batch_size=2
. You can use nvidia-smi
command to check the GPU memory usage.
Could an alternative be to set it up in Google Colab for training with own data?
It is totally possible, in another irrelavant project, I have a Colab training script. Nonetheless, I don't have a plan to build one for this project as Colab can only train with small datasets.
Alright. Thank you very much.
Hi Yuan,
I am trying to make the ESC-50 Recipe work on my laptop with one GPU. The outout from when executing run_esc50.sh:
Also, in the run method it uses
os.uname()[1]
, which does not exist on my Windows PC, so I have changed it to import platformplatform.uname()
.Can you help with this?