-
Could wire up speech recognition on the audio chunks to:
- auto-name the clips when importing
- show the text an the audio block in the timeline
Bonus: add keyword / topic extraction.
-
#### Environment
- **Device**: iPhone 12
- **Earphones**: AirPods Pro (2nd Generation)
- **Software Version**: iOS [Specify Version]
#### Steps to Reproduce
1. Connect AirPods Pro to the iPhone…
-
## Computer Vision:
- [x] Add Depth Estimation pipeline
- [ ] Add Image Classification pipeline
- [ ] Add Image Segmentation pipeline
- [ ] Add Mask Generation pipeline
- [ ] Add Object Detecti…
-
# Task Name
Emoji-Grounded Speech Emotion Recognition
## Task Objective
The primary goal of the Emoji-Grounded Speech Emotion Recognition (EG-SER) task is to develop a system that can accurat…
-
Hi, after reading the paper, I am confused about the table 3.
What is the meaning of visual acc, audio acc and combine acc?
How did you calculate the result of 67.5%, 91.8%, 95.2%?
![default](http…
-
Every Breath You Don't Take: Deepfake Speech Detection Using Breath
https://arxiv.org/abs/2404.15143
-
Hi, thanks for your great work! I tested talklip with my own video, but the generated face in output video is blurred and appear clear border with background. The resolution of my test video is 1600x9…
-
Add demos on https://huggingface.co/huggingfacejs (feel free to contribute demos, or to ask joining the organization)
### Natural Language processing
- [ ] Fill mask
- [ ] Summarization
- [ ] …
-
- [x] explore options for mapping out sys arch
- [x] explore tools: mkdocs, markmap in VS Code
- [x] prep Github Page for docs hosting
- [x] visually represent the ideal RSC as a mindmap
- [x] note &…
-
`torchaudio` is an extension library for PyTorch, designed to facilitate audio processing using the same PyTorch paradigms familiar to users of its tensor library. It provides powerful tools for audio…