Value of Regex and log.txt not matching.

brainie commented 3 years ago

Looking through the code, I found this

 metric1 = {
        'name': 'accuracy',
        'regex': re.compile(r'\* accuracy: ([\.\deE+-]+)%')
    }

    metric2 = {
        'name': 'error',
        'regex': re.compile(r'\* error: ([\.\deE+-]+)%')
    }

Turns out that inside parse_function, the regex isn't matched with what is in the log.txt. This makes it not populate the output variable of the code shown below.

 for metric in metrics:
        match = metric['regex'].search(line)
        if match and good_to_go:
            print("good_to_go")
            if 'file' not in output:
                output['file'] = fpath
            num = float(match.group(1))
            name = metric['name']
            output[name] = num

What am I doing wrong?

KaiyangZhou commented 3 years ago

what's your log file? it's difficult to understand what went wrong without knowing more details

brainie commented 3 years ago

here is log.txt file for seed1

** Arguments **
***************
backbone: 
config_file: configs/trainers/StyleMatch/ssdg_pacs_v1.yaml
dataset_config_file: configs/datasets/ssdg_pacs.yaml
eval_only: False
head: 
load_epoch: None
model_dir: 
no_train: False
opts: ['MODEL.BACKBONE.NAME', 'resnet18', 'DATASET.NUM_LABELED', '210']
output_dir: output/ssdg_pacs/nlab_210/StyleMatch/resnet18/v1/art_painting/seed1
resume: 
root: /home/johnsonibironke/kaiyang/data
seed: 1
source_domains: ['cartoon', 'photo', 'sketch']
target_domains: ['art_painting']
trainer: StyleMatch
transforms: None
************
** Config **
************
DATALOADER:
  K_TRANSFORMS: 1
  NUM_WORKERS: 4
  RETURN_IMG0: True
  TEST:
    BATCH_SIZE: 100
    SAMPLER: SequentialSampler
  TRAIN_U:
    BATCH_SIZE: 32
    N_DOMAIN: 0
    N_INS: 16
    SAME_AS_X: True
    SAMPLER: RandomSampler
  TRAIN_X:
    BATCH_SIZE: 48
    N_DOMAIN: 0
    N_INS: 16
    SAMPLER: SeqDomainSampler
DATASET:
  ALL_AS_UNLABELED: False
  CIFAR_C_LEVEL: 1
  CIFAR_C_TYPE: 
  NAME: SSDGPACS
  NUM_LABELED: 210
  NUM_SHOTS: -1
  ROOT: /home/johnsonibironke/kaiyang/data
  SOURCE_DOMAINS: ['cartoon', 'photo', 'sketch']
  STL10_FOLD: -1
  TARGET_DOMAINS: ['art_painting']
  VAL_PERCENT: 0.1
INPUT:
  COLORJITTER_B: 0.4
  COLORJITTER_C: 0.4
  COLORJITTER_H: 0.1
  COLORJITTER_S: 0.4
  CROP_PADDING: 4
  CUTOUT_LEN: 16
  CUTOUT_N: 1
  GB_K: 21
  GB_P: 0.5
  GN_MEAN: 0.0
  GN_STD: 0.15
  INTERPOLATION: bilinear
  NO_TRANSFORM: False
  PIXEL_MEAN: [0.485, 0.456, 0.406]
  PIXEL_STD: [0.229, 0.224, 0.225]
  RANDAUGMENT_M: 10
  RANDAUGMENT_N: 2
  RGS_P: 0.2
  SIZE: (224, 224)
  TRANSFORMS: ('random_flip', 'random_translation', 'normalize')
MODEL:
  BACKBONE:
    NAME: resnet18
    PRETRAINED: True
  HEAD:
    ACTIVATION: relu
    BN: True
    DROPOUT: 0.0
    HIDDEN_LAYERS: ()
    NAME: 
  INIT_WEIGHTS: 
OPTIM:
  ADAM_BETA1: 0.9
  ADAM_BETA2: 0.999
  BASE_LR_MULT: 0.1
  GAMMA: 0.1
  LR: 0.003
  LR_SCHEDULER: cosine
  MAX_EPOCH: 40
  MOMENTUM: 0.9
  NAME: sgd
  NEW_LAYERS: ()
  RMSPROP_ALPHA: 0.99
  SGD_DAMPNING: 0
  SGD_NESTEROV: False
  STAGED_LR: False
  STEPSIZE: (-1,)
  WARMUP_CONS_LR: 1e-05
  WARMUP_EPOCH: -1
  WARMUP_MIN_LR: 1e-05
  WARMUP_RECOUNT: True
  WARMUP_TYPE: linear
  WEIGHT_DECAY: 0.0005
OUTPUT_DIR: output/ssdg_pacs/nlab_210/StyleMatch/resnet18/v1/art_painting/seed1
RESUME: 
SEED: 1
TEST:
  COMPUTE_CMAT: False
  EVALUATOR: Classification
  FINAL_MODEL: last_step
  NO_TEST: False
  PER_CLASS_RESULT: False
  SPLIT: test
TRAIN:
  CHECKPOINT_FREQ: 0
  COUNT_ITER: train_u
  PRINT_FREQ: 10
TRAINER:
  CG:
    ALPHA_D: 0.5
    ALPHA_F: 0.5
    EPS_D: 1.0
    EPS_F: 1.0
  DAEL:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 0.5
  DDAIG:
    ALPHA: 0.5
    CLAMP: False
    CLAMP_MAX: 1.0
    CLAMP_MIN: -1.0
    G_ARCH: 
    LMDA: 0.3
    WARMUP: 0
  ENTMIN:
    LMDA: 0.001
  FIXMATCH:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 1.0
  M3SDA:
    LMDA: 0.5
    N_STEP_F: 4
  MCD:
    N_STEP_F: 4
  MEANTEA:
    EMA_ALPHA: 0.999
    RAMPUP: 5
    WEIGHT_U: 1.0
  MIXMATCH:
    MIXUP_BETA: 0.75
    RAMPUP: 20000
    TEMP: 2.0
    WEIGHT_U: 100.0
  MME:
    LMDA: 0.1
  NAME: StyleMatch
  SE:
    CONF_THRE: 0.95
    EMA_ALPHA: 0.999
    RAMPUP: 300
  STYLEMATCH:
    ADAIN_DECODER: weights/decoder.pth
    ADAIN_VGG: weights/vgg_normalised.pth
    APPLY_AUG: True
    APPLY_STY: True
    CLASSIFIER: stochastic
    CONF_THRE: 0.95
    C_OPTIM:
      ADAM_BETA1: 0.9
      ADAM_BETA2: 0.999
      BASE_LR_MULT: 0.1
      GAMMA: 0.1
      LR: 0.01
      LR_SCHEDULER: cosine
      MAX_EPOCH: 40
      MOMENTUM: 0.9
      NAME: sgd
      NEW_LAYERS: ()
      RMSPROP_ALPHA: 0.99
      SGD_DAMPNING: 0
      SGD_NESTEROV: False
      STAGED_LR: False
      STEPSIZE: (-1,)
      WARMUP_CONS_LR: 1e-05
      WARMUP_EPOCH: -1
      WARMUP_MIN_LR: 1e-05
      WARMUP_RECOUNT: True
      WARMUP_TYPE: linear
      WEIGHT_DECAY: 0.0005
    INFERENCE_MODE: deterministic
    N_ENSEMBLE: 10
    SAVE_SIGMA: False
    STRONG_TRANSFORMS: ('random_flip', 'randaugment_fixmatch', 'normalize', 'cutout')
USE_CUDA: True
VERBOSE: True
VERSION: 1
Collecting env info ...
** System info **
PyTorch version: 1.8.1
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.7 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] torch==1.8.1
[pip3] torchvision==0.9.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               10.1.243             h6bb024c_0  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.3.0           h06a4308_520  
[conda] mkl-service               2.4.0            py37h7f8727e_0  
[conda] mkl_fft                   1.3.0            py37h42c9631_2  
[conda] mkl_random                1.2.2            py37h51133e4_0  
[conda] numpy                     1.21.2                   pypi_0    pypi
[conda] numpy-base                1.20.3           py37h74d4b33_0  
[conda] pytorch                   1.8.1           py3.7_cuda10.1_cudnn7.6.3_0    pytorch
[conda] torchvision               0.9.1                py37_cu101    pytorch
        Pillow (8.3.1)

Loading trainer: StyleMatch
Building transform_train
+ resize to 224x224
+ random flip
+ random translation
+ to torch tensor of range [0, 1]
+ normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Building transform_train
+ resize to 224x224
+ random flip
+ randaugment_fixmatch (n=2)
+ to torch tensor of range [0, 1]
+ cutout (n_holes=1, length=16)
+ normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
Loading dataset: SSDGPACS
Reading split from "/home/johnsonibironke/kaiyang/data/pacs/splits_ssdg/art_painting_nlab210_seed1.json"
* Using custom transform for training
Building transform_test
+ resize to 224x224
+ to torch tensor of range [0, 1]
+ normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
***** Dataset statistics *****
  Dataset: SSDGPACS
  Source domains: ['cartoon', 'photo', 'sketch']
  Target domains: ['art_painting']
  # classes: 7
  # train_x: 210
  # train_u: 6,926
  # val: 806
  # test: 2,048
Building G
Backbone: resnet18
# params: 11,176,512
Building C
# params: 7,168
Loading evaluator: Classification
Building vgg and decoder for style transfer
Loading decoder weights from weights/decoder.pth
Loading vgg weights from weights/vgg_normalised.pth
No checkpoint found, train from scratch
Initializing summary writer for tensorboard with log_dir=output/ssdg_pacs/nlab_210/StyleMatch/resnet18/v1/art_painting/seed1

There are 4 other seeds. Here is the attached log.txt log.txt

KaiyangZhou commented 3 years ago

what does the directory structure look like?

what's the command you used?

what's the error message you got?

how reproduce your error?

brainie commented 3 years ago

This is how the directory structure look like for seed1 (where the log.txt file is located)

home/johnsonibironke/Documents/Github_files/ssdg-benchmark/output/ssdg_pacs/nlab_210/StyleMatch/resnet18/v1/art_painting/seed1

This is the command i used

python parse_test_res.py output/ssdg_pacs/nlab_210/StyleMatch/resnet18/v1 --multi-exp

This is the error message

Parsing files in output/ssdg_pacs/nlab_210/StyleMatch/resnet18/v1/art_painting
Traceback (most recent call last):
  File "parse_test_res.py", line 188, in <module>
    main(args, end_signal)
  File "parse_test_res.py", line 147, in main
    end_signal=end_signal
  File "parse_test_res.py", line 97, in parse_function
    assert len(outputs) > 0, f'Nothing found in {directory}'
AssertionError: Nothing found in output/ssdg_pacs/nlab_210/StyleMatch/resnet18/v1/art_painting

The error did not reproduce.

Thank you very much

KaiyangZhou commented 3 years ago

Your log file doesn't contain any results for the code to extract, since it ends with Initializing summary writer.

What the code does is after seeing Finished training, it detects the keywords, accuracy and error, and extracts the values.

brainie commented 3 years ago

I notice that too.

So i tried running tools/train.py from dassl.pytorch repo, to train the data from scratch, but this was the result.

***************
** Arguments **
***************
backbone: 
config_file: 
dataset_config_file: 
eval_only: False
head: 
load_epoch: None
model_dir: 
no_train: False
opts: []
output_dir: 
resume: 
root: 
seed: -1
source_domains: None
target_domains: None
trainer: 
transforms: None
************
** Config **
************
DATALOADER:
  K_TRANSFORMS: 1
  NUM_WORKERS: 4
  RETURN_IMG0: False
  TEST:
    BATCH_SIZE: 32
    SAMPLER: SequentialSampler
  TRAIN_U:
    BATCH_SIZE: 32
    N_DOMAIN: 0
    N_INS: 16
    SAME_AS_X: True
    SAMPLER: RandomSampler
  TRAIN_X:
    BATCH_SIZE: 32
    N_DOMAIN: 0
    N_INS: 16
    SAMPLER: RandomSampler
DATASET:
  ALL_AS_UNLABELED: False
  CIFAR_C_LEVEL: 1
  CIFAR_C_TYPE: 
  NAME: 
  NUM_LABELED: -1
  NUM_SHOTS: -1
  ROOT: 
  SOURCE_DOMAINS: ()
  STL10_FOLD: -1
  TARGET_DOMAINS: ()
  VAL_PERCENT: 0.1
INPUT:
  COLORJITTER_B: 0.4
  COLORJITTER_C: 0.4
  COLORJITTER_H: 0.1
  COLORJITTER_S: 0.4
  CROP_PADDING: 4
  CUTOUT_LEN: 16
  CUTOUT_N: 1
  GB_K: 21
  GB_P: 0.5
  GN_MEAN: 0.0
  GN_STD: 0.15
  INTERPOLATION: bilinear
  NO_TRANSFORM: False
  PIXEL_MEAN: [0.485, 0.456, 0.406]
  PIXEL_STD: [0.229, 0.224, 0.225]
  RANDAUGMENT_M: 10
  RANDAUGMENT_N: 2
  RGS_P: 0.2
  SIZE: (224, 224)
  TRANSFORMS: ()
MODEL:
  BACKBONE:
    NAME: 
    PRETRAINED: True
  HEAD:
    ACTIVATION: relu
    BN: True
    DROPOUT: 0.0
    HIDDEN_LAYERS: ()
    NAME: 
  INIT_WEIGHTS: 
OPTIM:
  ADAM_BETA1: 0.9
  ADAM_BETA2: 0.999
  BASE_LR_MULT: 0.1
  GAMMA: 0.1
  LR: 0.0003
  LR_SCHEDULER: single_step
  MAX_EPOCH: 10
  MOMENTUM: 0.9
  NAME: adam
  NEW_LAYERS: ()
  RMSPROP_ALPHA: 0.99
  SGD_DAMPNING: 0
  SGD_NESTEROV: False
  STAGED_LR: False
  STEPSIZE: (-1,)
  WARMUP_CONS_LR: 1e-05
  WARMUP_EPOCH: -1
  WARMUP_MIN_LR: 1e-05
  WARMUP_RECOUNT: True
  WARMUP_TYPE: linear
  WEIGHT_DECAY: 0.0005
OUTPUT_DIR: ./output
RESUME: 
SEED: -1
TEST:
  COMPUTE_CMAT: False
  EVALUATOR: Classification
  FINAL_MODEL: last_step
  NO_TEST: False
  PER_CLASS_RESULT: False
  SPLIT: test
TRAIN:
  CHECKPOINT_FREQ: 0
  COUNT_ITER: train_x
  PRINT_FREQ: 10
TRAINER:
  CG:
    ALPHA_D: 0.5
    ALPHA_F: 0.5
    EPS_D: 1.0
    EPS_F: 1.0
  DAEL:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 0.5
  DDAIG:
    ALPHA: 0.5
    CLAMP: False
    CLAMP_MAX: 1.0
    CLAMP_MIN: -1.0
    G_ARCH: 
    LMDA: 0.3
    WARMUP: 0
  ENTMIN:
    LMDA: 0.001
  FIXMATCH:
    CONF_THRE: 0.95
    STRONG_TRANSFORMS: ()
    WEIGHT_U: 1.0
  M3SDA:
    LMDA: 0.5
    N_STEP_F: 4
  MCD:
    N_STEP_F: 4
  MEANTEA:
    EMA_ALPHA: 0.999
    RAMPUP: 5
    WEIGHT_U: 1.0
  MIXMATCH:
    MIXUP_BETA: 0.75
    RAMPUP: 20000
    TEMP: 2.0
    WEIGHT_U: 100.0
  MME:
    LMDA: 0.1
  NAME: 
  SE:
    CONF_THRE: 0.95
    EMA_ALPHA: 0.999
    RAMPUP: 300
USE_CUDA: True
VERBOSE: True
VERSION: 1
Collecting env info ...
** System info **
PyTorch version: 1.8.1
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.3 LTS (x86_64)
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.7 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: No CUDA
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.21.2
[pip3] torch==1.8.1
[pip3] torchvision==0.9.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               10.1.243             h6bb024c_0  
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.3.0           h06a4308_520  
[conda] mkl-service               2.4.0            py37h7f8727e_0  
[conda] mkl_fft                   1.3.0            py37h42c9631_2  
[conda] mkl_random                1.2.2            py37h51133e4_0  
[conda] numpy                     1.21.2                   pypi_0    pypi
[conda] numpy-base                1.20.3           py37h74d4b33_0  
[conda] pytorch                   1.8.1           py3.7_cuda10.1_cudnn7.6.3_0    pytorch
[conda] torchvision               0.9.1                py37_cu101    pytorch
        Pillow (8.3.1)

Traceback (most recent call last):
  File "train.py", line 190, in <module>
    main(args)
  File "train.py", line 106, in main
    trainer = build_trainer(cfg)
  File "/home/johnsonibironke/Documents/Github_files/Dassl.pytorch/dassl/engine/build.py", line 8, in build_trainer
    check_availability(cfg.TRAINER.NAME, avai_trainers)
  File "/home/johnsonibironke/Documents/Github_files/Dassl.pytorch/dassl/utils/tools.py", line 178, in check_availability
    '(do you mean [{}]?)'.format(available, requested, psb_ans)
ValueError: The requested one is expected to belong to ['MCD', 'MME', 'ADDA', 'DAEL', 'DANN', 'AdaBN', 'M3SDA', 'SourceOnly', 'SelfEnsembling', 'DDAIG', 'DAELDG', 'Vanilla', 'CrossGrad', 'EntMin', 'FixMatch', 'MixMatch', 'MeanTeacher', 'SupBaseline'], but got [] (do you mean [SupBaseline]?)

Do you think this can affect the training of the ssdg-benchmark? and what can i do to complete the training?

KaiyangZhou commented 3 years ago

did you follow exactly the instruction outlined here https://github.com/KaiyangZhou/ssdg-benchmark#how-to-run-stylematch?

it would be easier for me to identify your problem if you could provide more details on how you reached that error, like any changes you made to the code or what steps you followed (after installation).

brainie commented 3 years ago

Yes i did follow the instruction from https://github.com/KaiyangZhou/ssdg-benchmark#how-to-run-stylematch. I made no changes to the code.

But i am guessing the problem came from not running the train.py from dassl.pytorch repo, (that is supposed to train the data from scratch) when i download the data set from the start.

However when i tried running it. It gave me the error i posted in my comment immediately above yours.

KaiyangZhou commented 3 years ago

I've checked the code and find no issue with the running

what do you see if you run

conda activate dassl
cd ssdg-benchmark/scripts/StyleMatch
bash run_ssdg.sh ssdg_pacs 210 v1

brainie commented 3 years ago

it runs in about 20 iterations and give an output folder

here is the folder output.zip

KaiyangZhou commented 3 years ago

do you see the results in the log files?

like this (in the very end)

...
Finished training
Do evaluation on test set
=> result
* total: 2,048
* correct: 1,605
* accuracy: 78.37%
* error: 21.63%

if so, parse_test_log.py should work (it will extract the results values e.g. 78.37 for accuracy)

KaiyangZhou / ssdg-benchmark

Value of Regex and log.txt not matching. #5