Closed ChangNamAn closed 3 years ago
Hi,
I found the correct way when I checked the original Drain_demo.py. (https://github.com/logpai/logparser/blob/master/demo/Drain_demo.py)
Regular expression list for optional preprocessing (default: []) regex = [ r'blk_(|-)[0-9]+' , # block id r'(/|)([0-9]+.){3}[0-9]+(:[0-9]+|)(:|)', # IP r'(?<=[^A-Za-z0-9])(-?+?\d+)(?=[^A-Za-z0-9])|[0-9]+$', # Numbers ] st = 0.5 # Similarity threshold depth = 4 # Depth of all leaf nodes
parser = Drain.LogParser(log_format, indir=input_dir, outdir=output_dir, depth=depth, st=st, rex=regex) parser.parse(log_file)
--regex="$REGEX1 $REGEX2 $REGEX3 $REGEX4" is passed the value as 1 size of list.
Please check it.
Thank you for letting me know. I used nargs='*'
in argparse to receive multiple inputs as list (check this for usage ). But I should use --regex $REGEX1 $REGEX2 $REGEX3
instead of --regex="$REGEX1 $REGEX2 $REGEX3"
in shell scripts
Not covered REGEX3='(?<=Warning: we failed to resolve data source name )[\w\s]+' argparse is processing it to ['(0x)[0-9a-fA-F]+', '\d+.\d+.\d+.\d+', '(?<=Warning:', 'we', 'failed', 'to', 'resolve', 'data', 'source', 'name', ')[\w\s]+', '\d+']
Try this in
process_tbird.sh
--regex "$REGEX1" "$REGEX2" "$REGEX3" "$REGEX4"
data_process.py
parser.add_argument("--regex", nargs='*', help="regex to clean log messages", default='')
Hi, The regular expression is set in process_bgl.sh as below. REGEX1='(0x)[0-9a-fA-F]+' REGEX2='\d+.\d+.\d+.\d+' REGEX3='(/[-\w]+)+' REGEX4='\d+
And they are passed to preprocess() in Drain.py with option. --regex="$REGEX1 $REGEX2 $REGEX3 $REGEX4" \
regex value is pass to self.rex in Drain.py and self.rex list has only 1 size when I checked it.
When I tried to test with preprocess() above, I got the result as 0below. TP: 3, TN: 46, FP: 3, FN: 3 Precision: 50.00%, Recall: 50.00%, F1-measure: 50.00% And below is the partial log in the structured log file and I can see the log that did not parse.
And I modified the preprocess() as below. def preprocess(self, line): rex_list = self.rex[0].split(' ') for currentRex in rex_list: line = re.sub(currentRex, '<*>', line) return line
The test result is better more than before. TP: 6, TN: 44, FP: 1, FN: 0 Precision: 85.71%, Recall: 100.00%, F1-measure: 92.31%
And log parsing is better, I think. ciod: Message code 0 is not 51 or 4294967295,e872bbe9,ciod: Message code <> is not <> or <*>
What is your intention for preprocess()? My modification is correct?