DavidDiazGuerra / icoDOA

Code repository for the paper Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs
GNU Affero General Public License v3.0
29 stars 9 forks source link

orV and mic_orV #4

Closed JuanFMontesinos closed 1 year ago

JuanFMontesinos commented 1 year ago

Hi :)

I have to say I'm struggling a lot to make the code to work with 2D arrays. It seems to work easily with 3D ones.

My first guess was the position of the sources and the mic matters a lot. I'm placing the array in one side of the box and sampling everything in a cone in front of it, which should be a gentle task for 2D arrays, and still not working. So I'm really wondering whether it's about a totally different set of hyperparameters or maybe some mic config that is wrong.

To that extent, I was considering what are the orV and mic_orV vectors (how should they be defined)

For example, in the eigenmike mic (sphere), the mic orVs are the external normal to each mic. What does the magnitude represent? What about the orV vector? I've seen it's used for the 2D arrays to define the minimum pos, but didn't found anything else.

Still in your code these vectors doesn't seem very important, it's a unitary vector for the 2D arrays you defined.

Have you experimented with the 2D at some point?

Juan

DavidDiazGuerra commented 1 year ago

Hi Juan,

I think the issue with 2D arrays might be related with their frontward-backward ambiguity, since if you compute their SRP maps in a spherical/icosahedral grid, the map is going to be symmetrical and this symmetry is going to be propagated to the output of the icoCNN. I have never worked with 2D arrays with this network architecture, but I would bet that the output of the soft-argmax final layer can only be in the plane of the array with this kind of symmetrical maps.

Some time ago I worked with a circular array using the Cross3D model and for that I just sampled the maps only in $\theta\in[0, \pi/2]$ instead of in $\theta\in[0, \pi]$; I think this was done automatically if arrayType was set to 'planar' in the class ArraySetup, but I don't remember it well. However, with the icoCNNs you cannot have only half of the icosahedral grid. Maybe you could try to just set half of the map to 0 and see if it works?

About the mic_orV vectors, they are used for the acoustic simulations in the case of using microphones with a non-omnidirectional directivity pattern and they should be unitary vectors steering in the same direction as the microphones. About the orV vector, that was supposed to indicate the orientation of the whole array, but to be honest I don't remember if I finally used it or not. This assertion makes me think that there was some additional feature that I wanted to add with orV but that I finally never did.

I hope this helped.

Best regards, David

JuanFMontesinos commented 1 year ago

Thanks for the hint. I've rewritten apply_extras

    def apply_extras(self, maps, acoustic_scene_batch, vad_batch=None):
        if self.apply_vad:
            if acoustic_scene_batch is not None:
                vad_batch = np.array(
                    [acoustic_scene_batch[i].vad for i in range(len(acoustic_scene_batch))])
            # Breaks if neither acoustic_scene_batch nor vad_batch was given
            assert vad_batch is not None
            vad_output_th = vad_batch.mean(axis=-1) > 2 / 3
            vad_output_th = vad_output_th[:, np.newaxis,
                                          :, np.newaxis, np.newaxis, np.newaxis]
            vad_output_th = torch.from_numpy(
                vad_output_th.astype(float)).to(maps.device)
            repeat_factor = np.array(maps.shape)
            repeat_factor[:-3] = 1
            maps *= vad_output_th.float().repeat(repeat_factor.tolist())

        for i, ac_scene in enumerate(acoustic_scene_batch):
            grid = self.srp.grid # 3D coordinates of each point in the icosaedron
            array = ac_scene.array_setup
            if array.is_planar():
                mask = array.orV == 1
                grid_mask = np.logical_or.reduce(grid[..., mask] < 0, axis=-1)
                grid_mask = torch.from_numpy(grid_mask).to(maps.device)
                maps[i, ..., grid_mask].zero_()
        return maps

The idea was zeroing the maps behind the array. This didn't really worked. Besides, to ensure the prob were ok also zeroed the values before softmax.

        self.clean_vertices = icoCNN.CleanVertices(1)
        ico_grid = torch.from_numpy(icoCNN.icosahedral_grid_coordinates(1))
        self.ico_grid = rearrange(
            ico_grid, 'charts H W coors -> coors charts H W', charts=5, coors=3)
        self.sam = at_modules.SoftArgMax(
            self.ico_grid.shape[1:], indexes=self.ico_grid, include_exp=True)

    def apply_cnn(self, x):
        B = x.shape[0]
        H = x.shape[-2]
        W = x.shape[-1]
        r = self.r
        x = rearrange(x, 'B C T charts H W -> B T C 1 charts H W', charts=5)
        for idx, (ico_conv, temp_conv, l_norm) in enumerate(zip(self.ico_cnn, self.temp_cnn, self.layer_norm)):
            x = torch.relu(ico_conv(x))
            x = rearrange(
                x, 'B T C R charts H W -> (B R charts H W) C T', B=B, R=6, charts=5)
            x = temp_conv(x)
            x = rearrange(
                x, '(B R charts H W) C T -> B T C R charts H W', R=6, charts=5, B=B, H=H, W=W)
            if idx < (len(self.ico_cnn) - 1):
                x = l_norm(x)
            x = self.process_vertices[r - 1](x)
            if idx < (len(self.ico_cnn) - 1):
                x = torch.relu(x)
            if idx % 2 == 1 and r > 1:
                x = self.poolings[idx // 2](x)
                r -= 1
                H //= 2
                W //= 2
        x = reduce(x, 'B T C R charts H W -> B T charts H W',
                   'max', R=6, charts=5)
        return x

    def forward(self, x, array=None):
        x = self.apply_cnn(x)
        x = self.clean_vertices(x)
        if array.is_planar():
            idxs = np.where(array.orV == 1)
            mask = torch.ones_like(self.ico_grid[0], dtype=torch.bool)
            for idx in idxs:
                tmask = self.ico_grid[idx] < 0
                mask = torch.logical_and(mask, tmask[0])

            x[..., mask].zero_()
        y = self.sam(x)
        return y

None of these solutions even combined seemed to work so. Just mentioning in case other ppl struggles with the same problem :) I'll give up for now with the 2D arrays.

Thanks Juan

JuanFMontesinos commented 1 year ago

Actually it's working :) It was a bug in my side

Seems getting elements with a mask just takes a copy of the tensor, so I was zeroing a copy. Can be fixed by simply multiplying

        for i, ac_scene in enumerate(acoustic_scene_batch):
            grid = self.srp.grid # 3D coordinates of each point in the icosaedron
            array = ac_scene.array_setup
            if array.is_planar():
                mo=maps.clone()
                mask = array.orV == 1
                grid_mask = np.logical_or.reduce(grid[..., mask] < 0, axis=-1)
                grid_mask = torch.from_numpy(~grid_mask).to(maps.device).float()

                # map_b = maps[i].clone().detach().cpu().numpy()
                maps[i] *= grid_mask