FraunhoferISST / TREND

Traceability Enforcement of Datatransfers (TREND)
https://fraunhoferisst.github.io/TREND/
Other
5 stars 2 forks source link

Increase Watermark Robustness #20

Open mhellmeier opened 3 months ago

mhellmeier commented 3 months ago

🚀 Feature Request

Current Problem

When changing a watermarked text, the watermark inside the cover text can get destroyed. This can occur by moving sentences inside a text, deleting content, adding new content, or copying existing content.

Proposed Solution

The overall robustness of the watermarker library needs to be increased.

One possible example: If a small watermark is included inside an extended cover text (e.g., 10 times), the watermarker library should be able to extract the watermark even if 4 of the 10 watermark repetitions got destroyed.

Additional Context

If a control char is implemented (see #18), the watermarker library needs a strategy if this control char gets destroyed.

mhellmeier commented 1 month ago

First starting points:

Minor thing:

Future ideas:

hnorkowski commented 3 days ago

First starting points:

* [ ]  Update the squashing method for all watermarks so that it only returns one watermark based on the length (robustness increase for all watermarks)

Finding the most plausible watermark on basic watermarks (i.e. just a list of bytes and you know nothing about what it represents) can be done with frequency analysis. This approach is generic and could be implemented as static function in Watermark. It needs to be a static function instead of a method because it requires taking a list of watermarks. The Watermarker and JvmWatermark could use the feature by default.

* [ ]  Update the `SizedWatermark` so that it uses the size of the watermark (robustness increase for Sized Trendmarks)

The basic usage is already implemented. The validate method of Trendmark checks for correct size, checksum, and hash (depending on the variant of Trendmark).

* [ ]  Check error correcting codes for the CRC32 trendmarks

The validation of the checksum is already implemented in the validateChecksum method that will be automatically called when calling the validate function. Maybe it is possible to correct errors with the CRC32 codes, I am not sure because CRC32 can correct single bit errors but our text watermarking alg. does not work with bits. currently its works with 4 states. A new method repair could be added to the Checksum interface and then every implementing checksum can implement a recovery strategy according to the specific checksum, if possible.

* [ ]  Update Trendmark documentation with recommendations for robustness (for example, suggest using the `SizedCRC32Watermark` if robustness is important and mention the trade-offs).

Further analysis methods could be implemented as static method in Trendmark. The extracting methods of (Jvm)Watermarker could be extended by another parameter trashing: Bool that defines if all Trendmarks which produce an error or warning are thrown away.