This repo contains the Inference code for AudioSeal, a method for speech localized watermarking, with state-of-the-art robustness and detector speed (training code coming soon).
To learn more, check out our paper.
[arXiv
]
[🤗Hugging Face
]
[Colab Notebook
]
[Webpage
]
[Blog
]
[Press
]
AudioSeal introduces a breakthrough in proactive, localized watermarking for speech. It jointly trains two components: a generator that embeds an imperceptible watermark into audio and a detector that identifies watermark fragments in long or edited audio files.
pip install audioseal
To install from source: Clone this repo and install in editable mode:
git clone https://github.com/facebookresearch/audioseal
cd audioseal
pip install -e .
You can find all the model checkpoints on the Hugging Face Hub. We provide the checkpoints for the following models:
Note that the message is optional and has no influence on the detection output. It may be used to identify a model version for instance (up to $2**16=65536$ possible choices).
Here’s a quick example of how you can use AudioSeal’s API to embed and detect watermarks:
from audioseal import AudioSeal
# model name corresponds to the YAML card file name found in audioseal/cards
model = AudioSeal.load_generator("audioseal_wm_16bits")
# Other way is to load directly from the checkpoint
# model = Watermarker.from_pretrained(checkpoint_path, device = wav.device)
# a torch tensor of shape (batch, channels, samples) and a sample rate
# It is important to process the audio to the same sample rate as the model
# expects. In our case, we support 16khz audio
wav, sr = ..., 16000
watermark = model.get_watermark(wav, sr)
# Optional: you can add a 16-bit message to embed in the watermark
# msg = torch.randint(0, 2, (wav.shape(0), model.msg_processor.nbits), device=wav.device)
# watermark = model.get_watermark(wav, message = msg)
watermarked_audio = wav + watermark
detector = AudioSeal.load_detector("audioseal_detector_16bits")
# To detect the messages in the high-level.
result, message = detector.detect_watermark(watermarked_audio, sr)
print(result) # result is a float number indicating the probability of the audio being watermarked,
print(message) # message is a binary vector of 16 bits
# To detect the messages in the low-level.
result, message = detector(watermarked_audio, sr)
# result is a tensor of size batch x 2 x frames, indicating the probability (positive and negative) of watermarking for each frame
# A watermarked audio should have result[:, 1, :] > 0.5
print(result[:, 1 , :])
# Message is a tensor of size batch x 16, indicating of the probability of each bit to be 1.
# message will be a random tensor if the detector detects no watermarking from the audio
print(message)
Interested in training your own watermarking model? Check out our training documentation to get started.
We welcome pull requests with improvements or suggestions. If you wish to report an issue or propose an enhancement but are unsure how to implement it, feel free to create a GitHub issue.
If you encounter the error ValueError: not enough values to unpack (expected 3, got 2)
, this is because we expect a batch of audio tensors as inputs. Add one
dummy batch dimension to your input (e.g. wav.unsqueeze(0)
, see example notebook for getting started).
In Windows machines, if you encounter the error KeyError raised while resolving interpolation: "Environmen variable 'USER' not found"
: This is due to an old checkpoint
uploaded to the model hub, which is not compatible in Windows. Try to invalidate the cache by removing the files in C:\Users\<USER>\.cache\audioseal
and re-run again.
If you use torchaudio to handle your audios and encounter the error Couldn't find appropriate backend to handle uri ...
, this is due to newer version of
torchaudio does not handle the default backend well. Either downgrade your torchaudio to 2.1.0
or earlier, or install soundfile
as your audio backend.
If you find this repository useful, please consider giving it a star :star: and citing our work:
@article{sanroman2024proactive,
title={Proactive Detection of Voice Cloning with Localized Watermarking},
author={San Roman, Robin and Fernandez, Pierre and Elsahar, Hady and D´efossez, Alexandre and Furon, Teddy and Tran, Tuan},
journal={ICML},
year={2024}
}