How is the Shanghai tech A and B dataset labelled ?

fatbringer commented 2 years ago

Hello !

May I know how did people label the density maps on shanghai tech A and B data set ? Is it possible for us to add on extra images and data to the data set ?

Pongpisit-Thanasutives commented 2 years ago

Hi @fatbringer,

Regarding density map generation based on the MCNN paper, the researchers applied the convolution with a Gaussian kernel to an image representing point-wise head labels. Box labeling is also doable, as provided in cclabeler.
I think it is possible. However, if the extra images have highly diverse crowd density or the number of heads to the original dataset, training the neural network as a crowd counter may be challenging.

Thank you for the questions. Hope this helps!

fatbringer commented 2 years ago

I see. Thanks for sharing the link to the paper ! I can see that the mat file contains the ground truth of every head, and the hdf5 file contains the attention and density values of each pixel.

I found a script on kaggle which seems to also be for ShanghaiTech A here at this link ShanghaiTech_a_train_density_gen. and here ShanghaiTech visualize annotation.
Do you know how were the hdf5 files generated ?

I think it is possible. However, if the extra images have highly diverse crowd density or the number of heads to the original dataset, training the neural network as a crowd counter may be challenging.

Is there a minimum number of heads for the dataset ? i.e. ieg to avoid images that have only 50 people ?

Pongpisit-Thanasutives commented 2 years ago

As an example of how to generate the hdf5 files, I suggest you take a look at the SFANet repository.
The minimum number of heads is dataset-dependent, and there is no such obligation to add a particular number of people. That said, including too many extra images with just a specific crowd count may result in an imbalanced dataset.

fatbringer commented 2 years ago

Thank you for sharing the link @Pongpisit-Thanasutives !

Ah i understand now. So while the number of persons matter less, but it is more important to use a varying scene such as in different places eg field, hall, stadium etc. Is that a good way to understand it ?

Pongpisit-Thanasutives commented 2 years ago

@fatbringer That is a good take!

fatbringer commented 2 years ago

Hi @Pongpisit-Thanasutives so i tried using the code to generate the density map a-train-density-gen.

I also created mat files for my own images. Which closely resembles the mat files there were used for the shanghai tech dataset.

However, i ran into this error.

IndexError                                Traceback (most recent call last)
Cell In [2], line 101
     98 start_time = time.time()
     99 a_train, a_test, b_train, b_test = generate_shanghaitech_path(__DATASET_ROOT)
--> 101 Parallel(n_jobs=4)(delayed(generate_density_map)(p) for p in a_train)
    103 print("--- %s seconds ---" % (time.time() - start_time))

File ~/Desktop/pyenvs/shanghaitest/lib/python3.9/site-packages/joblib/parallel.py:1098, in Parallel.__call__(self, iterable)
   1095     self._iterating = False
   1097 with self._backend.retrieval_context():
-> 1098     self.retrieve()
   1099 # Make sure that we get a last message telling us we are done
   1100 elapsed_time = time.time() - self._start_time

File ~/Desktop/pyenvs/shanghaitest/lib/python3.9/site-packages/joblib/parallel.py:975, in Parallel.retrieve(self)
    973 try:
    974     if getattr(self._backend, 'supports_timeout', False):
--> 975         self._output.extend(job.get(timeout=self.timeout))
    976     else:
    977         self._output.extend(job.get())

File ~/Desktop/pyenvs/shanghaitest/lib/python3.9/site-packages/joblib/_parallel_backends.py:567, in LokyBackend.wrap_future_result(future, timeout)
    564 """Wrapper for Future.result to implement the same behaviour as
    565 AsyncResults.get from multiprocessing."""
    566 try:
--> 567     return future.result(timeout=timeout)
    568 except CfTimeoutError as e:
    569     raise TimeoutError from e

File /usr/lib/python3.9/concurrent/futures/_base.py:445, in Future.result(self, timeout)
    443     raise CancelledError()
    444 elif self._state == FINISHED:
--> 445     return self.__get_result()
    446 else:
    447     raise TimeoutError()

File /usr/lib/python3.9/concurrent/futures/_base.py:390, in Future.__get_result(self)
    388 if self._exception:
    389     try:
--> 390         raise self._exception
    391     finally:
    392         # Break a reference cycle with the exception in self._exception
    393         self = None

IndexError: invalid index to scalar variable.

Is this because my mat file differs this way, at the ending of the mat file ? My mat file

[ 874.22 ,  221.445]]), array([[250]], dtype=uint16)]],
              dtype=object)                                                   ]],
      dtype=object)}

shanghaitech's mat file

[ 477.69532428,  129.33412027]]), array([[233]], dtype=uint8))]],
              dtype=[('location', 'O'), ('number', 'O')])                               ]],
      dtype=object)}

Pongpisit-Thanasutives commented 2 years ago

Sorry for the late response. I cannot access the link you refer to. However, the error seems to be about indexing. Do you unintentionally index to a scalar somewhere while looping? I suggest you debug without the multiprocessing code Parallel(n_jobs=4).

Hope this helps.

fatbringer commented 2 years ago

I believe so. Managed to solve the problem :)

Another question Do you know why are the ShanghaiTech A and B datasets always trained separately ? Instead of training as 1 combined dataset ?

Also, how is the sparse or dense decided in ShanghaiTech ? Is it by the area of the frame that are occupied by people ?

Pongpisit-Thanasutives / Variations-of-SFANet-for-Crowd-Counting

How is the Shanghai tech A and B dataset labelled ? #30