facebookresearch / audioseal

Localized watermarking for AI-generated speech audios, with SOTA on robustness and very fast detector
MIT License
401 stars 46 forks source link

:loud_sound: AudioSeal: Proactive Localized Watermarking

Python Code style: black

Inference code for AudioSeal, a method for speech localized watermarking, with state-of-the-art robustness and detector speed (training code coming soon). More details can be found in the paper.

[arXiv] [🤗Hugging Face] [Colab notebook] [Webpage] [Blog] [Press]

fig

Updates:

Abtract

We introduce AudioSeal, a method for speech localized watermarking, with state-of-the-art robustness and detector speed. It jointly trains a generator that embeds a watermark in the audio, and a detector that detects the watermarked fragments in longer audios, even in the presence of editing. Audioseal achieves state-of-the-art detection performance of both natural and synthetic speech at the sample level (1/16k second resolution), it generates limited alteration of signal quality and is robust to many types of audio editing. Audioseal is designed with a fast, single-pass detector, that significantly surpasses existing models in speed — achieving detection up to two orders of magnitude faster, making it ideal for large-scale and real-time applications.

:mate: Installation

AudioSeal requires Python >=3.8, Pytorch >= 1.13.0, omegaconf, julius, and numpy. To install from PyPI:

pip install audioseal

To install from source: Clone this repo and install in editable mode:

git clone https://github.com/facebookresearch/audioseal
cd audioseal
pip install -e .

:gear: Models

You can find all the model checkpoints on the Hugging Face Hub. We provide the checkpoints for the following models:

Note that the message is optional and has no influence on the detection output. It may be used to identify a model version for instance (up to $2**16=65536$ possible choices).

:abacus: Usage

Audioseal provides a simple API to watermark and detect the watermarks from an audio sample. Example usage:


from audioseal import AudioSeal

# model name corresponds to the YAML card file name found in audioseal/cards
model = AudioSeal.load_generator("audioseal_wm_16bits")

# Other way is to load directly from the checkpoint
# model =  Watermarker.from_pretrained(checkpoint_path, device = wav.device)

# a torch tensor of shape (batch, channels, samples) and a sample rate
# It is important to process the audio to the same sample rate as the model
# expectes. In our case, we support 16khz audio 
wav, sr = ..., 16000

watermark = model.get_watermark(wav, sr)

# Optional: you can add a 16-bit message to embed in the watermark
# msg = torch.randint(0, 2, (wav.shape(0), model.msg_processor.nbits), device=wav.device)
# watermark = model.get_watermark(wav, message = msg)

watermarked_audio = wav + watermark

detector = AudioSeal.load_detector("audioseal_detector_16bits")

# To detect the messages in the high-level.
result, message = detector.detect_watermark(watermarked_audio, sr)

print(result) # result is a float number indicating the probability of the audio being watermarked,
print(message)  # message is a binary vector of 16 bits

# To detect the messages in the low-level.
result, message = detector(watermarked_audio, sr)

# result is a tensor of size batch x 2 x frames, indicating the probability (positive and negative) of watermarking for each frame
# A watermarked audio should have result[:, 1, :] > 0.5
print(result[:, 1 , :])  

# Message is a tensor of size batch x 16, indicating of the probability of each bit to be 1.
# message will be a random tensor if the detector detects no watermarking from the audio
print(message)  

Train your own watermarking model

See here for details on how to train your own Watermarking model.

Want to contribute?

We welcome Pull Requests with improvements or suggestions. If you want to flag an issue or propose an improvement, but dont' know how to realize it, create a GitHub Issue.

Troubleshooting

License

Maintainers:

Citation

If you find this repository useful, please consider giving a star :star: and please cite as:

@article{sanroman2024proactive,
  title={Proactive Detection of Voice Cloning with Localized Watermarking},
  author={San Roman, Robin and Fernandez, Pierre and Elsahar, Hady and D´efossez, Alexandre and Furon, Teddy and Tran, Tuan},
  journal={ICML},
  year={2024}
}