certtools / intelmq

IntelMQ is a solution for IT security teams for collecting and processing security feeds using a message queuing protocol.
https://docs.intelmq.org/latest/
GNU Affero General Public License v3.0
967 stars 295 forks source link

unit tests fail on debian 9 #1301

Closed aaronkaplan closed 6 years ago

aaronkaplan commented 6 years ago

Install according to the developers guide, then run a unit test immediately:


root@do-portal-test:/opt/dev_intelmq# python3 -m unittest  discover
E..INFO - Reading /opt/dev_intelmq/intelmq/etc/harmonization.conf file
.......sss.....x....................ssss..(-2, 'Name or service not known')
(-2, 'Name or service not known')
...........................sss...........ss......ssss..ss...........................................................................................................................................................................................................................................................................................................................................
======================================================================
ERROR: test_output (intelmq.tests.bin.test_gen_harm_docs.TestGenHarmDocs)
Compare output to cached one.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/dev_intelmq/intelmq/tests/bin/test_gen_harm_docs.py", line 22, in test_output
    expected = handle.read()
  File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1703: ordinal not in range(128)

----------------------------------------------------------------------
Ran 433 tests in 23.826s

FAILED (errors=1, skipped=18, expected failures=1)
aaronkaplan commented 6 years ago

The reason is some UTF characters in ./docs/Harmonization-fields.md

aaronkaplan commented 6 years ago

and of course the unit test output also expects that. But the default python3 installation (according to the INSTALL file) will throw that error.

aaronkaplan commented 6 years ago

here is how to patch it:

diff --git a/intelmq/etc/harmonization.conf b/intelmq/etc/harmonization.conf
index 0fb52dc0..092f0f52 100644
--- a/intelmq/etc/harmonization.conf
+++ b/intelmq/etc/harmonization.conf
@@ -10,7 +10,7 @@
             "type": "LowercaseString"
         },
         "classification.type": {
-            "description": "The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid \u201ctype explosion\u201d, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.",
+            "description": "The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid type explosion, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.",
             "type": "ClassificationType"
         },
         "comment": {
aaronkaplan commented 6 years ago

After this patch, it worked... the unit tests ran successfully

root@do-portal-test:/opt/dev_intelmq# python3 -m unittest discover
...INFO - Reading /opt/dev_intelmq/intelmq/etc/harmonization.conf file
.......sss.....x....................ssss..(-2, 'Name or service not known')
(-2, 'Name or service not known')
...........................sss...........ss......ssss..ss...........................................................................................................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 433 tests in 8.870s

OK (skipped=18, expected failures=1)
ghost commented 6 years ago

Can't reproduce this (also works on travis and the build service, the latter has no UTF encoding).

The characters you are removing in the patch are quotes, see the implied changes in the docs:

diff --git a/docs/Harmonization-fields.md b/docs/Harmonization-fields.md
index 95bd515c9..c49176583 100644
--- a/docs/Harmonization-fields.md
+++ b/docs/Harmonization-fields.md
@@ -6,7 +6,7 @@ Harmonization field names
 |:------|:---|:---|:----------|
 |Classification|classification.identifier|[String](#string)|The lowercase identifier defines the actual software or service (e.g. 'heartbleed' or 'ntp_version') or standardized malware name (e.g. 'zeus'). Note that you MAY overwrite this field during processing for your individual setup. This field is not standardized across IntelMQ setups/user
s.|
 |Classification|classification.taxonomy|[LowercaseString](#lowercasestring)|We recognize the need for the CSIRT teams to apply a static (incident) taxonomy to abuse data. With this goal in mind the type IOC will serve as a basis for this activity. Each value of the dynamic type mapping translates to a an element in the static taxonomy. The Euro
pean CSIRT teams for example have decided to apply the eCSIRT.net incident classification. The value of the taxonomy key is thus a derivative of the dynamic type above. For more information about check [ENISA taxonomies](http://www.enisa.europa.eu/activities/cert/support/incident-management/browsable/incident-handling-process/incident-taxonomy/existing-taxonomies).|
-|Classification|classification.type|[ClassificationType](#classificationtype)|The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid “type explosion”, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.|
+|Classification|classification.type|[ClassificationType](#classificationtype)|The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid type explosion, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.|
 | |comment|[String](#string)|Free text commentary about the abuse event inserted by an analyst.|
 |Destination|destination.abuse_contact|[LowercaseString](#lowercasestring)|Abuse contact for destination address. A comma separated list.|
 |Destination|destination.account|[String](#string)|An account name or email address, which has been identified to relate to the destination of an abuse event.|
aaronkaplan commented 6 years ago

use a default debian stable

ghost commented 6 years ago

which version?

ghost commented 6 years ago

I can't reproduce it with a fresh Debian 9

aaronkaplan commented 6 years ago

I'll show you on mine...

ghost commented 6 years ago

I just noticed, that this file

...
ERROR: test_output (intelmq.tests.bin.test_gen_harm_docs.TestGenHarmDocs)
...
  File "/opt/dev_intelmq/intelmq/tests/bin/test_gen_harm_docs.py", line 22, in test_output
...

does not exist in the 1.1.x branch, only in 1.0.x. Bu I still can not not reproduce it also with the successor test.

I'll push a patch that sets the encoding directly in the new test.