CESNET / Nemea-Framework

Nemea framework is the heart of the Nemea system. It contains implementation of common communication interfaces, UniRec data format and useful datastructures and algorithms.
11 stars 24 forks source link

pytrap: apply replacement on invalid utf-8 strings implicitly #194

Closed cejkato2 closed 1 year ago

cejkato2 commented 2 years ago

Python string fails when non-utf8 bytes occure in the field value due to decoding error. This patch replaces straightforward use of PyUnicode_FromStringAndSize() with PyUnicode_DecodeUTF8(), which can be set to replace invalid bytes.

As a result, the invalid non-utf byte sequences are replaced by 0xFFFD character (dec ~ 65533).

It is worth noting that the replacement process affects performance (quite significantly). On the other hand, valid UTF-8 strings perform similarly as before this PR.

codecov-commenter commented 2 years ago

Codecov Report

Base: 80.00% // Head: 80.00% // No change to project coverage :thumbsup:

Coverage data is based on head (d5bc3e0) compared to base (64f1e6d). Patch has no changes to coverable lines.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #194 +/- ## ======================================= Coverage 80.00% 80.00% ======================================= Files 2 2 Lines 10 10 ======================================= Hits 8 8 Misses 2 2 ``` | Flag | Coverage Δ | | |---|---|---| | tests | `80.00% <ø> (ø)` | | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=CESNET#carryforward-flags-in-the-pull-request-comment) to find out more. Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=CESNET). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=CESNET)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.