OpenCTI-Platform / client-python

OpenCTI Python Client
https://www.opencti.io
Apache License 2.0
108 stars 125 forks source link

pycti library has problem with non-ASCII characters #723

Open Lhorus6 opened 3 weeks ago

Lhorus6 commented 3 weeks ago

Description

In a stream connector, we use the "self.helper.listen_stream()" function to listen to a stream. The problem is that the data is truncated when a non-ascii character passes through it.

The use case is as follows:

Seen in the stream

image

Retrieved by my connector

Screenshot 2024-08-29 220415

Environment

OCTI 6.2.16

Reproducible Steps

Steps to create the smallest reproducible scenario:

  1. Create a live stream with the filters "Entity type: File AND label: test-bug".
  2. Create a File with only a MD5 hash (no author, no marking, etc to avoid noise in the stream).
  3. Run in debug mode a stream connector listening your stream and with a breakpoint to the place where it processes the retrieved data.
  4. Add in the "name" field of the File: 2020년 연구 ì „ë¬¸ì› 및 수자원분야 ê²½ë ¥ì‚¬ì› ì„ ë°œ 모집요강.hwp
  5. Add the label "test-bug" on the File to send it in the stream.
  6. Look at the connector side for the data retrieved. -> truncated data

Expected Output

Have the whole data, like what I have in my stream

romain-filigran commented 2 weeks ago

@richard-julien : Are you aware of that ? I'll try to reproduce it

richard-julien commented 2 weeks ago

I remember one case where we was not able to reproduce. If we have a good repro case, we need to fix that :)

Lhorus6 commented 2 weeks ago

Ping me for the repro case if needed @romain-filigran