How to finetune on strong label dataset?

wengstA commented 1 year ago

您好，非常棒的工作！但是我在强标注数据集上finetune进行训练的时候有一些疑惑，我想请问一下您在issue 25中提到 "need to extract different output of HST-AT (I believe it is the last second layer feature-map output)", 这个last second layer 指的是token semantic 模块的输出吗，以及您提到“ the interpolation and resolution of the output may be different from the input localization time resolution ----- in that you need to find a way to align them.”，您代码中将输出的时间轴进行插值处理后变成1024的长度，算是一种处理的方式吗？若您回答我的问题，不胜感激！

RetroCirce commented 1 year ago

您好，很抱歉没有及时回复，最近各种忙对，这个输出就是token smenatic的输出（也就是T x C的heatmep）。因为在模型的处理中，时间的分辨率会被压缩，我们通常是做插值，变成1024，你可以理解为我们的模型只在0.2-0.32秒（应该是这样左右的数值）为帧长做事件定位（通过训练弱标签数据），但是原来的分辨率是1024，所以我们认为在0.2-0.32秒这样的长度内的事件是固定的（做了插值）

wengstA commented 1 year ago

好的理解！非常感谢！！

RetroCirce / HTS-Audio-Transformer

How to finetune on strong label dataset? #30