-
Google published a dataset called AudioSet, which consists of about 2 million 10-second YouTube clips that are manually annotated on a hierarchical ontology.
(thanks to @rbroc for finding out about …
-
Hi @harritaylor,
The `vggish.py` in line 87 reports no attribute 'T', which Pytorch version do you use?
-
Hello,
I am getting this error while trying to run the code via command line on Ubuntu. I use the command `python3 parse_file.py Recording_5.wav`
Here is the Traceback:
Traceback (most rec…
-
I have noticed that given the wav file, audioset will output sec*96*64 examples_batch and final output sec*128 embedding output. I want to get a larger and fixed embedding output such as 400, but I do…
-
I used the visual_feature.h5 and audio_feature.h5 that you provided. The test result under AV_att is 61.5, and 72.7 in your paper
I use nb_epoch = 500
pytorch version is 1.0.1
The operating syste…
-
Having successfully configured using `python3 waf configure` as shown below:
skrowten-hermit@HOLY-DIVER-W10LT:~/work/audioprism/lib/essentia$ python3 waf configure --build-static --with-python …
-
Hey @v-iashin
Thanks for open sourcing such an awesome work!!!
Kudos to you on this and MDVC.
I was wondering since my videos are not of English language but I do require captions in the English …
-
I have run the [VGGish](https://github.com/tensorflow/models/tree/master/research/audioset) model to extracting features from .wav files. But the 128-embedding features seem quite different to the pub…
-
Hello, I have some .wav files, and I want to train the classification model on my own datasets.
How can I use this code? Extract embeddings and train a sequence model? Is it possible to finetune VG…
-
#### Description
I noticed that `y_frames` in librosa.core.spectrum.stft() is computed by
```python
y_frames = util.frame(y, frame_length=n_fft, hop_length=hop_length)
```
when `win_length` is …