Open alishan2040 opened 2 years ago
Hello, I was trying to run deeplog on hdfs dataset but ended up with the following error.
command I used: !python main_run.py --folder=bgl/ --log_file=HDFS.log --dataset_name=hdfs --model_name=deeplog --window_type=sliding\ --sample=sliding_window --is_logkey --train_size=0.8 --train_ratio=1 --valid_ratio=0.1 --test_ratio=1 --max_epoch=100\ --n_warm_up_epoch=0 --n_epochs_stop=10 --batch_size=1024 --num_candidates=150 --history_size=10 --lr=0.001\ --accumulation_step=5 --session_level=hour --window_size=60 --step_size=60 --output_dir=experimental_results/demo/random/ --is_process
!python main_run.py --folder=bgl/ --log_file=HDFS.log --dataset_name=hdfs --model_name=deeplog --window_type=sliding\ --sample=sliding_window --is_logkey --train_size=0.8 --train_ratio=1 --valid_ratio=0.1 --test_ratio=1 --max_epoch=100\ --n_warm_up_epoch=0 --n_epochs_stop=10 --batch_size=1024 --num_candidates=150 --history_size=10 --lr=0.001\ --accumulation_step=5 --session_level=hour --window_size=60 --step_size=60 --output_dir=experimental_results/demo/random/ --is_process
Are we supposed to run other scripts first to generate such files (for example data_loader.py or synthesize.py) Can we re-run the code with other formats of HDFS dataset which are publicly available? Thanks,
data_loader.py
synthesize.py
Do you solve this problem? How can we get this structured.csv?
You can use logparser(can be found in github) to preprocess HDFS dataset, and it can generate HDFS.log_structured.csv
Hello, I was trying to run deeplog on hdfs dataset but ended up with the following error.
command I used:
!python main_run.py --folder=bgl/ --log_file=HDFS.log --dataset_name=hdfs --model_name=deeplog --window_type=sliding\ --sample=sliding_window --is_logkey --train_size=0.8 --train_ratio=1 --valid_ratio=0.1 --test_ratio=1 --max_epoch=100\ --n_warm_up_epoch=0 --n_epochs_stop=10 --batch_size=1024 --num_candidates=150 --history_size=10 --lr=0.001\ --accumulation_step=5 --session_level=hour --window_size=60 --step_size=60 --output_dir=experimental_results/demo/random/ --is_process
Are we supposed to run other scripts first to generate such files (for example
data_loader.py
orsynthesize.py
) Can we re-run the code with other formats of HDFS dataset which are publicly available? Thanks,