-
this is probably just me being really bad at coding, I'm trying to run the example inference code in README and am getting this error:
`ImportError: cannot import name 'inference' from 'pipeline' (/…
-
Hello, I am trying to assess feasibility of continued training of a CLAP model. On your report of training on a single A100 on Clotho, https://stability.wandb.io/clap/clap/reports/CLAP-trained-on-Clot…
-
Can you please describe how the **clotho-dataset** directory look like?
is this capture below right?
![image](https://user-images.githubusercontent.com/63258184/93985861-9430dd80-fdc0-11ea-84b2-cd…
-
Hi, I tried replicating the audio to text retrieval results using the PyPI library and the hugging face implementation, however the obtained numbers do not match with those reported in the paper.
For…
-
I am experiencing an issue with a training script for an audio-visual model where the text_branch components are not loading any pre-trained weights as expected. The unloaded components include all la…
-
Hi all!
What are the datasets used in the pre-trained model provided in the Google link?
Were [630k-audioset-best.pt](https://huggingface.co/lukewys/laion_clap/blob/main/630k-audioset-best.pt) and […
-
Hi Zhifeng,
Thank you so much for your help!
This issue is related to https://github.com/NVIDIA/audio-flamingo/issues/5, https://github.com/NVIDIA/audio-flamingo/issues/6, https://github.com/NVI…
-
Hello,
Thank you very much for this great work. I have few questions about the paper/code.
1- Have you tried training with Wavcaps or a larger dataset? From the wavcaps paper, it seems that using …
-
# Task Name
Audio Segment Retrieval with Text Descriptions
## Task Objective
The objective is to retrieve specific parts of an audio clip based on textual descriptions. This represents a chal…
-
Thanks again for the excellent work,
it is not clear to me how the `settings.yaml` should be set to perform the first step you indicate in your work. How do you train your framework with Audiocaps…