FraunhoferISST / TREND

Traceability Enforcement of Datatransfers (TREND)
https://fraunhoferisst.github.io/TREND/
Other
5 stars 2 forks source link

Use Control Char / Tag as Watermarking Start #18

Closed mhellmeier closed 3 weeks ago

mhellmeier commented 2 months ago

🚀 Feature Request

Current Problem

The watermarker library is able to add a watermark with or without compression. When analyzing a watermarked text or file, the library needs to know which type of watermark is used (compressed, uncompressed, specific format, etc.). This is currently not possible. There might be specific use cases that have specific requirements towards the style, compression, linting or format of the watermark.

Proposed Solution

Every watermark should start with a 2-digit control character (like a number) that identifies the type of watermark. Using a 2-digit control char instead of a 1-digit allows to have a bigger namespace for future formats.

Example: Instead of adding Test as a watermark, 00Test will be watermarked if the watermark is uncompressed, 01Test will be added as a watermark if the watermark is compressed. The first control char must be inserted without compression to get it working.

Additional Context

The other components, like the CLI tool and the webinterface, need to be updated after the issue is implemented since it is a breaking change.

Further, a table in the documentation is needed to document the control char and its meaning, for example: Control Char Meaning
00 Uncompressed Watermark
01 Compressed Watermark using X compression technique
02 Specialized compression for use case Y
03 ...
... ...
mhellmeier commented 2 months ago

After discussion, the following decision could be good for a first implementation: