Open 2718455213wcx opened 1 year ago
I'm getting a valuerror when running the thunderbird dataset in the brain ValueError: Length of values (2000000) does not match length of index (2000)
Traceback (most recent call last):
File "E:\Brain-main\Code\evaluate.py", line 53, in
Hello! Thanks for your interest in our work. You may split your large dataset into mutilple small chunks to solve memory issue. Since I have used Brain as a preprocessing tool in large log file compression, I'm sure that Brain can perform well using this method.
This reported issue seems not very clear. Could you please provide more context?
Investigating the relevant code sections carefully, I understand why it went wrong. You are using the code of this repositry. This issue is caused by the design of Brain.parse() → df_example = df_input → dfexample['EventTemplate']=template. template_ is a parsed result (2000000) and dfinput is ground truth (2000). You need to create a new dataframe to store the parsed results (i.e., template & EventID) to resolve this error. I designed this to make it easier to obtain accuracy results on the 2K benchmark dataset, and I apologize for any errors this may cause😂😂
Your may refer to Brain in LOGPAI, self.df_log is a dataframe to save parsed results
Thanks, I'll modify the code according to your suggestion
I'm running out of memory when I'm running the Thunderbird dataset(29.8gb) with Logpai's brain, is there a way to solve it?
Traceback (most recent call last): File "..\logparser\Brain.py", line 189, in tuple_generate result = number.most_common() File "C:\Users\3730.conda\envs\logbert\lib\collections__init__.py", line 610, in most_common