Closed edoardottt closed 1 year ago
Thanks for the advice :)) appreciated!!
Adding another two cents from my usage of cariddi recently:
I came across huge matches like:
[
{
"name":"PHP error",
"match":"PHP error"
},
{
"name":"MySQL error",
"match":"warning_forbid_default_priv"<MORE THAN 20000 LINES HERE>"
}
]
which completely destroy my terminal 😄
So we might think about either:
We could end up with a JSON format like:
[
{
"name": "MySQL Error",
"results": [
{
"type": "Regex",
"details": {"match": "Warning: ...<truncated_output>mysqli error: need new cache refresh... <truncated_output>", "regex": "(?i)Warning.*?mysqli?", "location": "line 42", "source": "body"}
}
]
}
]
Additionally, regexes have their limits - ideally we want to see one step further and create some kind of pattern-recognition algorithms, or using even using ML for this kind of tasks. It could be a good evolution for cariddi ;) The type
key would be useful in that case to differenciate the matches from regex matches:
[
{
"type": "PatternFinder",
"details": {"match": "Warning: ...<truncated_output>mysqli error: need new cache refresh... <truncated_output>", "matcher": "error-finder", "version": "2.0.1"}
},
{
"type": "ML",
"details": {"model_name": "my-awesome-ml-model", "version": "0.0.1"}
}
]
There is also room to improve the findings by filtering which ones are found important or not, for instance:
licensing@<domain>
or sales@<domain>
is very common and not very sensitiveThose "rules" could be first hardcoded by us on a case-by-case and then learned by ML as well at some point, and a severity
field could be set for each finding.
There might be a need to create separate issues for some of those points since it's not directly linked to the JSON lines aggregation. Feel free to copy-paste some of my comments there.
Hi @ocervell .
I've thought a lil bit before commenting on this. Imo the best thing to do is this:
This PR closes #115.
@ocervell what do u think?
This is a comparison test with the one shown in the issue: