YuanGongND / ast

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BSD 3-Clause "New" or "Revised" License
1.06k stars 203 forks source link

Cannot create a file when that file already exists: './exp/test-esc50-f10-t10-impTrue-aspTrue-b48-lr1e-5/fold1/models' #89

Closed JonathanFL closed 1 year ago

JonathanFL commented 1 year ago

Hi Yuan,

I am trying to make the ESC-50 Recipe work on my laptop with one GPU. The outout from when executing run_esc50.sh:

$ ./run_esc.sh 
+ source ../../venvast/Scripts/activate
++ deactivate nondestructive
++ '[' -n '' ']'
++ '[' -n '' ']'
++ '[' -n /bin/bash -o -n '' ']'
++ hash -r
++ '[' -n '' ']'
++ unset VIRTUAL_ENV
++ '[' '!' nondestructive = nondestructive ']'
++ VIRTUAL_ENV='C:\Users\jonat\source\repos\ast-master\venvast'
++ export VIRTUAL_ENV
++ _OLD_VIRTUAL_PATH='C:\Users\jonat\source\repos\ast-master\venvast/Scripts:/c/Users/jonat/bin:/mingw64/bin:/usr/local/bin:/usr/bin:/bin:/mingw64/bin:/usr/bin:/c/Users/jonat/bin:/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7/bin:/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7/libnvvp:/c/Program Files/Microsoft/jdk-11.0.12.7-hotspot/bin:/c/Windows/system32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0:/c/Windows/System32/OpenSSH:/c/Program Files/dotnet:/c/Program Files/Microsoft SQL Server/150/Tools/Binn:/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/170/Tools/Binn:/cmd:/c/Program Files/nodejs:/c/Program Files/NVIDIA Corporation/Nsight Compute 2022.2.1:/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR:/c/WINDOWS/system32:/c/WINDOWS:/c/WINDOWS/System32/Wbem:/c/WINDOWS/System32/WindowsPowerShell/v1.0:/c/WINDOWS/System32/OpenSSH:/c/Users/jonat/AppData/Local/Programs/Python/Python39/Scripts:/c/Users/jonat/AppData/Local/Programs/Python/Python39:/c/Users/jonat/AppData/Local/Microsoft/WindowsApps:/c/Users/jonat/AppData/Local/Programs/Microsoft VS Code/bin:/c/Users/jonat/AppData/Roaming/npm:/c/Users/jonat/AppData/Roaming/ffmpeg/bin:/c/Program Files (x86)/sox-14-4-2:/c/Users/jonat/.dotnet/tools:/c/Program Files/mosquitto:/usr/bin/vendor_perl:/usr/bin/core_perl'
++ PATH='C:\Users\jonat\source\repos\ast-master\venvast/Scripts:C:\Users\jonat\source\repos\ast-master\venvast/Scripts:/c/Users/jonat/bin:/mingw64/bin:/usr/local/bin:/usr/bin:/bin:/mingw64/bin:/usr/bin:/c/Users/jonat/bin:/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7/bin:/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.7/libnvvp:/c/Program Files/Microsoft/jdk-11.0.12.7-hotspot/bin:/c/Windows/system32:/c/Windows:/c/Windows/System32/Wbem:/c/Windows/System32/WindowsPowerShell/v1.0:/c/Windows/System32/OpenSSH:/c/Program Files/dotnet:/c/Program Files/Microsoft SQL Server/150/Tools/Binn:/c/Program Files/Microsoft SQL Server/Client SDK/ODBC/170/Tools/Binn:/cmd:/c/Program Files/nodejs:/c/Program Files/NVIDIA Corporation/Nsight Compute 2022.2.1:/c/Program Files/NVIDIA Corporation/NVIDIA NvDLISR:/c/WINDOWS/system32:/c/WINDOWS:/c/WINDOWS/System32/Wbem:/c/WINDOWS/System32/WindowsPowerShell/v1.0:/c/WINDOWS/System32/OpenSSH:/c/Users/jonat/AppData/Local/Programs/Python/Python39/Scripts:/c/Users/jonat/AppData/Local/Programs/Python/Python39:/c/Users/jonat/AppData/Local/Microsoft/WindowsApps:/c/Users/jonat/AppData/Local/Programs/Microsoft VS Code/bin:/c/Users/jonat/AppData/Roaming/npm:/c/Users/jonat/AppData/Roaming/ffmpeg/bin:/c/Program Files (x86)/sox-14-4-2:/c/Users/jonat/.dotnet/tools:/c/Program Files/mosquitto:/usr/bin/vendor_perl:/usr/bin/core_perl'
++ export PATH
++ '[' -n '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(venvast) '
++ export PS1
++ '[' -n /bin/bash -o -n '' ']'
++ hash -r
+ export TORCH_HOME=../../pretrained_models
+ TORCH_HOME=../../pretrained_models
+ model=ast
+ dataset=esc50
+ imagenetpretrain=True
+ audiosetpretrain=True
+ bal=none
+ '[' True == True ']'
+ lr=1e-5
+ freqm=24
+ timem=96
+ mixup=0
+ epoch=25
+ batch_size=48
+ fstride=10
+ tstride=10
+ base_exp_dir=./exp/test-esc50-f10-t10-impTrue-aspTrue-b48-lr1e-5
+ python ./prep_esc50.py
{'dog': '0', 'rooster': '1', 'pig': '2', 'cow': '3', 'frog': '4', 'cat': '5', 'hen': '6', 'insects': '7', 'sheep': '8', 'crow': '9', 'rain': '10', 'sea_waves': '11', 'crackling_fire': '12', 'crickets': '13', 'chirping_birds': '14', 'water_drops': '15', 'wind': '16', 'pouring_water': '17', 'toilet_flush': '18', 'thunderstorm': '19', 'crying_baby': '20', 'sneezing': '21', 'clapping': '22', 'breathing': '23', 'coughing': '24', 'footsteps': '25', 'laughing': '26', 'brushing_teeth': '27', 'snoring': '28', 'drinking_sipping': '29', 'door_wood_knock': '30', 'mouse_click': '31', 'keyboard_typing': '32', 'door_wood_creaks': '33', 'can_opening': '34', 'washing_machine': '35', 'vacuum_cleaner': '36', 'clock_alarm': '37', 'clock_tick': '38', 'glass_breaking': '39', 'helicopter': '40', 'chainsaw': '41', 'siren': '42', 'car_horn': '43', 'engine': '44', 'train': '45', 'church_bells': '46', 'airplane': '47', 'fireworks': '48', 'hand_saw': '49'}
fold 1: 1600 training samples, 400 test samples
fold 2: 1600 training samples, 400 test samples
fold 3: 1600 training samples, 400 test samples
fold 4: 1600 training samples, 400 test samples
fold 5: 1600 training samples, 400 test samples
Finished ESC-50 Preparation
+ '[' -d ./exp/test-esc50-f10-t10-impTrue-aspTrue-b48-lr1e-5 ']'
+ mkdir -p ./exp/test-esc50-f10-t10-impTrue-aspTrue-b48-lr1e-5
+ (( fold=1 ))
+ (( fold<=5 ))
+ echo 'now process fold1'
now process fold1
+ exp_dir=./exp/test-esc50-f10-t10-impTrue-aspTrue-b48-lr1e-5/fold1
+ tr_data=./data/datafiles/esc_train_data_1.json
+ te_data=./data/datafiles/esc_eval_data_1.json
+ CUDA_CACHE_DISABLE=1
+ python -W ignore ../../src/run.py --model ast --dataset esc50 --data-train ./data/datafiles/esc_train_data_1.json --data-val ./data/datafiles/esc_eval_data_1.json --exp-dir ./exp/test-esc50-f10-t10-impTrue-aspTrue-b48-lr1e-5/fold1 --label-csv ./data/esc_class_labels_indices.csv --n_class 50 --lr 1e-5 --n-epochs 25 --batch-size 48 --save_model False --freqm 24 --timem 96 --mixup 0 --bal none --tstride 10 --fstride 10 --imagenet_pretrain True --audioset_pretrain True
I am process 23348, running on uname_result(system='Windows', node='jonathanspc', release='10', version='10.0.22621', machine='AMD64'): starting (Sun Dec 18 17:21:14 2022)
now train a audio spectrogram transformer model
balanced sampler is not used
---------------the train dataloader---------------
now using following mask: 24 freq, 96 time
now using mix-up with rate 0.000000
now process esc50
use dataset mean -6.627 and std 5.358 to normalize the input.
number of classes is 50
---------------the evaluation dataloader---------------
now using following mask: 0 freq, 0 time
now using mix-up with rate 0.000000
now process esc50
use dataset mean -6.627 and std 5.358 to normalize the input.
number of classes is 50
---------------AST Model Summary---------------
ImageNet pretraining: True, AudioSet pretraining: True
frequncey stride=10, time stride=10
number of patches=600

Creating experiment directory: ./exp/test-esc50-f10-t10-impTrue-aspTrue-b48-lr1e-5/fold1
Now starting training for 25 epochs
running on cpu
Total parameter number is : 87.295 million
Total trainable parameter number is : 87.295 million
scheduler for esc-50 is used
now training with esc50, main metrics: acc, loss function: CrossEntropyLoss(), learning rate scheduler: <torch.optim.lr_scheduler.MultiStepLR object at 0x0000016B7FF43B20>
current #steps=0, #epochs=1
start training...
---------------
2022-12-18 17:21:16.337781
current #epochs=1, #steps=0
I am process 18304, running on uname_result(system='Windows', node='jonathanspc', release='10', version='10.0.22621', machine='AMD64'): starting (Sun Dec 18 17:21:18 2022)
now train a audio spectrogram transformer model
balanced sampler is not used
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python39\lib\multiprocessing\spawn.py", line 287, in _fixup_main_from_path 
    main_content = runpy.run_path(main_path,
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 288, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\jonat\source\repos\ast-master\src\run.py", line 95, in <module>
    os.makedirs("%s/models" % args.exp_dir)
  File "C:\Users\jonat\AppData\Local\Programs\Python\Python39\lib\os.py", line 225, in makedirs
    mkdir(name, mode)
FileExistsError: [WinError 183] Cannot create a file when that file already exists: './exp/test-esc50-f10-t10-impTrue-aspTrue-b48-lr1e-5/fold1/models'

Also, in the run method it uses os.uname()[1], which does not exist on my Windows PC, so I have changed it to import platform platform.uname().

Can you help with this?

YuanGongND commented 1 year ago

Hi there,

I am not familiar with Windows, but Cannot create a file when that file already exists: './exp/test-esc50-f10-t10-impTrue-aspTrue-b48-lr1e-5/fold1/models' means you need to either remove this old experiment directory, or change the experiment path specified in https://github.com/YuanGongND/ast/blob/9e3bd9942210680b833b08c39d09f2284ddc4d1d/egs/esc50/run_esc.sh#L48

The reason that I don't place an checking before creating the folder is because I intend to raise an error to avoid new run overwriting the old one.

-Yuan

JonathanFL commented 1 year ago

Hi Yuan,

I don't know why, but first, I had to create the 'exp' folder from the shell script and, afterwards, the 'models' folder. I guess the behaviour of makedirs is different between Windows and your OS.

models_dir = os.path.join(args.exp_dir,"models")
print("Creating models directory: %s" % models_dir)
if not os.path.exists(models_dir):
    os.mkdir(models_dir)

This still works, because of this part in the shell script

if [ -d $base_exp_dir ]; then
  echo 'exp exist'
  exit
fi

But I guess it would not work if not run from the shell script. Anyway, just letting you or anybody else know. Also, I had to modify the shell script for Windows paths, e.g., the python venv path.

On another note, I was unable to train, even with num_worker=0 and batch_size=2, because of the same error as this one.

Could an alternative be to set it up in Google Colab for training with own data? The modified shell script looks like this now:

#!/bin/bash
#SBATCH -p gpu
#SBATCH -x sls-titan-[0-2]
#SBATCH --gres=gpu:4
#SBATCH -c 4
#SBATCH -n 1
#SBATCH --mem=48000
#SBATCH --job-name="ast-esc50"
#SBATCH --output=./log_%j.txt

set -x
# comment this line if not running on sls cluster
#. /data/sls/scratch/share-201907/slstoolchainrc
source ../../venvast/Scripts/activate
export TORCH_HOME=../../pretrained_models

model=ast
dataset=esc50
imagenetpretrain=True
audiosetpretrain=True
bal=none
if [ $audiosetpretrain == True ]
then
  lr=1e-5
else
  lr=1e-4
fi
freqm=24
timem=96
mixup=0
epoch=25
batch_size=2
fstride=16
tstride=16
base_exp_dir=.\\exp\\test-${dataset}-f$fstride-t$tstride-imp$imagenetpretrain-asp$audiosetpretrain-b$batch_size-lr${lr}

python .\\prep_esc50.py

if [ -d $base_exp_dir ]; then
  echo 'exp exist'
  exit
fi
mkdir -p $base_exp_dir

for((fold=1;fold<=5;fold++));
do
  echo 'now process fold'${fold}

  exp_dir=${base_exp_dir}\\fold${fold}

  echo 'creating exp dir: '${exp_dir}
  mkdir -p $exp_dir

  tr_data=.\\data/datafiles\\esc_train_data_${fold}.json
  te_data=.\\data/datafiles\\esc_eval_data_${fold}.json

  CUDA_CACHE_DISABLE=1 python -W ignore ..\\..\\src\\run.py --model ${model} --dataset ${dataset} \
  --data-train ${tr_data} --data-val ${te_data} --exp-dir $exp_dir \
  --label-csv ./data/esc_class_labels_indices.csv --n_class 50 \
  --lr $lr --n-epochs ${epoch} --batch-size $batch_size --save_model False \
  --freqm $freqm --timem $timem --mixup ${mixup} --bal ${bal} \
  --tstride $tstride --fstride $fstride --imagenet_pretrain $imagenetpretrain --audioset_pretrain $audiosetpretrain
done

python .\\get_esc_result.py --exp_path ${base_exp_dir}

This "works", but again, I might have too few GPU resources on my laptop(?)

YuanGongND commented 1 year ago

On another note, I was unable to train, even with num_worker=0 and batch_size=2, because of the same error as https://github.com/YuanGongND/ast/issues/54#issuecomment-1073397042.

The model itself takes some GPU memory so it is possible that the code is not runable with batch_size=2. You can use nvidia-smi command to check the GPU memory usage.

Could an alternative be to set it up in Google Colab for training with own data?

It is totally possible, in another irrelavant project, I have a Colab training script. Nonetheless, I don't have a plan to build one for this project as Colab can only train with small datasets.

JonathanFL commented 1 year ago

Alright. Thank you very much.