SAGNIKMJR / move2hear-active-AV-separation

Code and datasets for 'Move2Hear: Active Audio-Visual Source Separation' (ICCV 2021)
MIT License
13 stars 0 forks source link

Understanding AAViSS specific dataset splits #5

Closed sreeharshaparuchur1 closed 1 year ago

sreeharshaparuchur1 commented 1 year ago

Hi @SAGNIKMJR ,

I have several questions about the AAViSS dataset splits:

Thank you

SAGNIKMJR commented 1 year ago
  1. Yes, we evaluate the effect of inter-source distance. See analysis in Supp. Sec. 7.3 2, 3. The dataset splits are available here: https://utexas.box.com/shared/static/vwrkm3kn06pobf8z6g3q3zom5ybei8oq.zip
  2. 'all_geodesic_distances' represent the geodesic distance between the agent and each of the sources, and also the inter-source geodesic distance. -1 represents the agent, 0 represents first source and 1 represents the second source. Hence, (-1, 0) is agent-to-source1, (-1, 1) is agents-to-source 2, and (0, 1) / (1, 0) is the inter-source distance.
  3. it's a redundant field, that's why it's set to null
  4. 'num_action' is a redundant field. 'start_idx' denotes the starting index in an audio clip at which the episode starts sampling the monaural audio, but it's insignificant in this setting, as the monaural audio is always sampled from the start of the clip and hence, is set to 0. 'target_label' is the index of the target audio class for the episode
sreeharshaparuchur1 commented 1 year ago

@SAGNIKMJR

in the dataset splits that you've shared, in the 'test_nearTarget_3Sources.json' file, the all_geodesic_distances' field is never 0 for the (-1,0) configuration. This is unexpected as the near target setting involves spawning the agent at the target sound source.

Also, this split just seems to be to test the setting. Was the agent not explicitly trained for this setting? If so, why not? Wouldn't you expect the agent to perform better if it was trained for the task of separating one target source in the presence of 2 distractor sounds.

Thank you.

SAGNIKMJR commented 1 year ago

Thanks for pointing out the mistake. The correct datasets are available here: https://utexas.box.com/shared/static/pbcbi27hw669gpibw8ax76cb0ggvyr45.zip

We didn't train an agent for this setting because even without retraining, we found our agent to generalize reasonably well. However, retraining can be expected to yield even better performance.