egrinstein / neural_srp

The Neural-SRP method for DOA estimation
13 stars 3 forks source link

Problems encountered in reproducing the model #2

Closed Underman30 closed 5 months ago

Underman30 commented 5 months ago

Hi, I've tried to reproduce neural-srp and encountered some problems:

  1. when I proprocess the tau-nigens dataset, the following error occurs, and when I change the value of self._nb_unique_classes from 2 to 3 it have been solved, Does it mean that the samples in the dataset do not have more than 2 sound source activities at most?

    tnb_classes[frame_ind, active_event] = 1
    IndexError: index 2 is out of bounds for axis 1 with size 2
  2. But When I tried to visualize the tau using the neural-srp-multi.bin, it reported that:

    target_doas = target_doas.view( target_doas.shape[0], target_doas.shape[1], 3, max_nb_doas ).transpose(-1, -2)
    RuntimeError: shape '[1, 50, 3, 2]' is invalid for input of size 500

    Is it because I change the value of self._nb_unique_classes?

  3. When I tried to loading the doanet.bin to visualize the tau, it occured: c884ce4b050eff7fd256d6c8227d20f how can I solve it?

Thank you in advance, looking forward to your reply!

Underman30 commented 5 months ago

Hi, I have solved the third problem which was really caused by my value change in step 1, but there are actually samples in TAU which contains 3 active events, did you use all the samples to be preprocessed? The second problem still confused me. And there are some other questions:

  1. When I use your checkpoints of Cross3d to viszualize the LOCATA, the global loss is more than 60, is this normal?
  2. When I tried to visualize the tau, some error occured like:
    File "D:\PyCharm\PycharmProjects\neural_srp\metrics.py", line 117, in partial_compute_metric
    dot_prods = torch.matmul(output.detach(), target_doas.transpose(-1, -2))
    RuntimeError: The size of tensor a (62) must match the size of tensor b (50) at non-singleton dimension 0

    how can I solve it, thank you ~

egrinstein commented 5 months ago

Hi, thank you for your interest. I have only tested the model with a maximum number of 2 sources, the 3-source samples were discarded.

I think the second error will be fixed by changing the parameters in params.json as described in the Configuration section on the Readme.md file.

The loss of 60 seems large, I think this might also be related to the parameter mismatch mentioned above.

I hope this helps.

Underman30 commented 5 months ago

Thank you for your help. So how to discard the 3-source samples? Did you use the whole mic_dev to train and test?

The second error has been fixed by changing the parameters. It was an oversight. But when I run the visualiza_tau.py again, it reports that I have used both cuda and cpu, how can I check and change it:

 File "D:\PyCharm\PycharmProjects\neural_srp\metrics.py", line 117, in partial_compute_metric
    dot_prods = torch.matmul(output.detach(), target_doas.transpose(-1, -2))
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_bmm)

Thank you in advance.

Underman30 commented 5 months ago
    def _get_filenames_list_and_feat_label_sizes(self):
        for filename in os.listdir(self._feat_dir):
            if filename == ".DS_Store":
                # Skip mac specific file
                continue
            if int(filename[4]) in self._splits: # check which split the file belongs to
            self._filenames_list.append(filename)

I find that if I use this line if int(filename[4]) in self._splits: # check which split the file belongs to , self._filenames_list become empty, should I remove it ?

By the way, I have changed the device listed below from "cpu" to "cuda", but it still reports that I used 2 devices:

hnet_model.load_state_dict(
        torch.load("hnet_model.h5", map_location=torch.device("cpu"))
    )
checkpoint_path = params["model_checkpoint_path"]
    state_dict = torch.load(checkpoint_path, map_location=torch.device("cpu"))
egrinstein commented 5 months ago

Hi, if I recall correctly you are right: some splits of mic_dev were used for training and another one was used for testing.

egrinstein commented 5 months ago

In these lines hnet_model.load_state_dict( torch.load("hnet_model.h5", map_location=torch.device("cpu")) )

checkpoint_path = params["model_checkpoint_path"] state_dict = torch.load(checkpoint_path, map_location=torch.device("cpu")),

the models are loaded into the CPU but are later transferred into GPU if you are using it.

Underman30 commented 5 months ago

Hi, Thanks for your help. I've tested the multisource experiment successfully, but the results seems not really good, the location errors of neural-srp-multi and doanet are all about 60, the test dataset I use is mic_dev/test, what's the potential problem of it and how can I improve it? The parameters are set as you mention in Readme.md .