EmersonElectricCo / fsf

File Scanning Framework
Apache License 2.0
285 stars 49 forks source link

Keys with '.' causing error on indexing of JSON scan report in Elasticsearch #16

Closed akniffe1 closed 8 years ago

akniffe1 commented 8 years ago

While working with using elasticsearch as a database for FSF scan reports I noticed an indexing problem that was the result of the key structure in META_PE, shown below:

         "Imports": {
                "version.dll": [...]

It appears that this will affect any database that is storing the raw report as 'flat' JSON, though I've only tested on Elasticsearch directly.

akniffe1 commented 8 years ago

After discussing with jxb5151 there may be an opportunity to globally sanitize the final scan report before it's written locally--either by the client submitting the scan report to the database, or perhaps as a cleanup function within FSF. The objective here being that module authors probably shouldn't have to account for this.

akniffe1 commented 8 years ago

Currently testing the following architecture:

filebeats --> logstash (with de_dot filter and JSON codec) --> elasticsearch with Kibana visualization

Will submit pull request with configs shortly

jxb5151 commented 8 years ago

A little more digging on this issue uncovered that de_dot does not process any further than the initial level. You can specify sub fields manually, but I don't think that scales well. The only alternative I have seen was what someone posted in this discussion:

https://discuss.elastic.co/t/field-name-cannot-contain/33251/43

Which uses ruby to process '.' recursively. However, the performance impact of this must not be manageable for those doing this at a larger scale. I have not independently tested, but that is what my gut says.

3rd party data sources people may integrate with at some level (like VirusTotal) may have '.' in the key name in the same way META_PE does in its current form. While we can't control how others choose to represent their data, we can modify META_PE at very little cost to us, and eliminate this issue for those using modules included in this build.