Closed aaronkaplan closed 6 years ago
The reason is some UTF characters in ./docs/Harmonization-fields.md
and of course the unit test output also expects that. But the default python3 installation (according to the INSTALL file) will throw that error.
here is how to patch it:
diff --git a/intelmq/etc/harmonization.conf b/intelmq/etc/harmonization.conf
index 0fb52dc0..092f0f52 100644
--- a/intelmq/etc/harmonization.conf
+++ b/intelmq/etc/harmonization.conf
@@ -10,7 +10,7 @@
"type": "LowercaseString"
},
"classification.type": {
- "description": "The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid \u201ctype explosion\u201d, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.",
+ "description": "The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid type explosion, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.",
"type": "ClassificationType"
},
"comment": {
After this patch, it worked... the unit tests ran successfully
root@do-portal-test:/opt/dev_intelmq# python3 -m unittest discover
...INFO - Reading /opt/dev_intelmq/intelmq/etc/harmonization.conf file
.......sss.....x....................ssss..(-2, 'Name or service not known')
(-2, 'Name or service not known')
...........................sss...........ss......ssss..ss...........................................................................................................................................................................................................................................................................................................................................
----------------------------------------------------------------------
Ran 433 tests in 8.870s
OK (skipped=18, expected failures=1)
Can't reproduce this (also works on travis and the build service, the latter has no UTF encoding).
The characters you are removing in the patch are quotes, see the implied changes in the docs:
diff --git a/docs/Harmonization-fields.md b/docs/Harmonization-fields.md
index 95bd515c9..c49176583 100644
--- a/docs/Harmonization-fields.md
+++ b/docs/Harmonization-fields.md
@@ -6,7 +6,7 @@ Harmonization field names
|:------|:---|:---|:----------|
|Classification|classification.identifier|[String](#string)|The lowercase identifier defines the actual software or service (e.g. 'heartbleed' or 'ntp_version') or standardized malware name (e.g. 'zeus'). Note that you MAY overwrite this field during processing for your individual setup. This field is not standardized across IntelMQ setups/user
s.|
|Classification|classification.taxonomy|[LowercaseString](#lowercasestring)|We recognize the need for the CSIRT teams to apply a static (incident) taxonomy to abuse data. With this goal in mind the type IOC will serve as a basis for this activity. Each value of the dynamic type mapping translates to a an element in the static taxonomy. The Euro
pean CSIRT teams for example have decided to apply the eCSIRT.net incident classification. The value of the taxonomy key is thus a derivative of the dynamic type above. For more information about check [ENISA taxonomies](http://www.enisa.europa.eu/activities/cert/support/incident-management/browsable/incident-handling-process/incident-taxonomy/existing-taxonomies).|
-|Classification|classification.type|[ClassificationType](#classificationtype)|The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid “type explosion”, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.|
+|Classification|classification.type|[ClassificationType](#classificationtype)|The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid type explosion, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.|
| |comment|[String](#string)|Free text commentary about the abuse event inserted by an analyst.|
|Destination|destination.abuse_contact|[LowercaseString](#lowercasestring)|Abuse contact for destination address. A comma separated list.|
|Destination|destination.account|[String](#string)|An account name or email address, which has been identified to relate to the destination of an abuse event.|
use a default debian stable
which version?
I can't reproduce it with a fresh Debian 9
I'll show you on mine...
I just noticed, that this file
... ERROR: test_output (intelmq.tests.bin.test_gen_harm_docs.TestGenHarmDocs) ... File "/opt/dev_intelmq/intelmq/tests/bin/test_gen_harm_docs.py", line 22, in test_output ...
does not exist in the 1.1.x branch, only in 1.0.x. Bu I still can not not reproduce it also with the successor test.
I'll push a patch that sets the encoding directly in the new test.
Install according to the developers guide, then run a unit test immediately: