Missing HDFS.log_structured.csv file

alishan2040 commented 2 years ago

Hello, I was trying to run deeplog on hdfs dataset but ended up with the following error.

command I used: !python main_run.py --folder=bgl/ --log_file=HDFS.log --dataset_name=hdfs --model_name=deeplog --window_type=sliding\ --sample=sliding_window --is_logkey --train_size=0.8 --train_ratio=1 --valid_ratio=0.1 --test_ratio=1 --max_epoch=100\ --n_warm_up_epoch=0 --n_epochs_stop=10 --batch_size=1024 --num_candidates=150 --history_size=10 --lr=0.001\ --accumulation_step=5 --session_level=hour --window_size=60 --step_size=60 --output_dir=experimental_results/demo/random/ --is_process

Are we supposed to run other scripts first to generate such files (for example data_loader.py or synthesize.py) Can we re-run the code with other formats of HDFS dataset which are publicly available? Thanks,

X-zhihao commented 2 years ago

Do you solve this problem? How can we get this structured.csv?

wangwenjing1999 commented 2 years ago

You can use logparser(can be found in github) to preprocess HDFS dataset, and it can generate HDFS.log_structured.csv

LogIntelligence / LogADEmpirical

Missing HDFS.log_structured.csv file #9