facebookresearch / audioseal

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
MIT License
452 stars 56 forks source link

The following 2 cases can't be detected #50

Open codelive opened 3 months ago

codelive commented 3 months ago
  1. AAC 64kbps encoding.
  2. OBS recording audio output, audio enhancement is enabled by default on windows 11 system.
antoine-tran commented 2 months ago

Hi @codelive , could you elaborate more on this ?

codelive commented 2 months ago

Hi @antoine-tran,

Here's my test code:

from audioseal import AudioSeal

import torch
import torchaudio
def watermark_embed():
    model = AudioSeal.load_generator("audioseal_wm_16bits")
    audio, sample_rate = torchaudio.load("input.wav")
    audios = audio.unsqueeze(0)
    bits = [[1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0]]
    secret_mesage = torch.tensor(bits, dtype=torch.int32)
    print(f"bits: {secret_mesage}")
    watermarked = model(audios, sample_rate=sample_rate, message=secret_mesage, alpha=1)
    watermarked_audio = watermarked.detach()
    torchaudio.save("output_seal.wav", src=watermarked_audio[0], sample_rate=sample_rate)

def watermark_detect():
    audio, sample_rate = torchaudio.load("output_seal.wav")
    audios = audio.unsqueeze(0)
    detector = AudioSeal.load_detector(("audioseal_detector_16bits"))
    result, message = detector.detect_watermark(audios, sample_rate=sample_rate, message_threshold=0.5)
    print(f"bits: {message}, score: {result}")

# watermark_embed()
watermark_detect()
  1. First call watermark_embed() to save a watermarked audio file "output_seal.wav" bits: tensor([[1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0]], dtype=torch.int32)

  2. The second step calls watermark_detect() to detect a watermark bits: tensor([[1, 1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0]], dtype=torch.int32), score: 1.0

  3. Call ffmpeg to transcode the watermarked file to aac at 64kbps, and then convert the aac to wav.

    ffmpeg -y -i output_seal.wav -b:a 64k output_seal.aac
    ffmpeg -y -i output_seal.aac output_seal.wav

    bits: tensor([[0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0]], dtype=torch.int32), score: 0.6068740487098694 The watermark detection score is about 0.6, which does not match the original watermark.

  4. Record the system's sound output using OBS and then detect it. bits: tensor([[1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1]], dtype=torch.int32), score: 0.202103391289711 Screenshots for obs recording and system audio settings: snap-audio-1 snap-obs-1

  5. I turned off auto enhancement and then recorded again with obs. The detection score is about 0.9, but it still doesn't match my original watermark. bits: tensor([[1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1]], dtype=torch.int32), score: 0.9000480771064758

The wav file I used: input.zip

Thank you very much for your reply.

pierrefdz commented 1 month ago

Thanks for raising this up! I don't see any way of solving this without fine-tuning the model with these augmentations... Tell us if you've found anything.