Open mateuszk098 opened 13 hours ago
Thank you very much for your message. The JSON file you sent looks correct. Could you share the JSON files you are using for training (at least 3 or 4) and the pickle file with the transformations? That way, I can inspect the graphs that are being generated for training to try to find the issue.
Thank you!
@pilarbachiller Thanks for your quick response! I managed to solve the issue—it was related to the translation vectors in the TransformManager
instance. As far as I can tell, you used meters (?) as the unit for these vectors, but in the AIST dataset, the translation vectors are in centimeters. This mismatch caused large values in the network’s input and output, which the sigmoid activation function mapped to 0s or 1s. After scaling the translations to meters, everything is working perfectly.
That said, I still have a few minor questions about your approach, as there are some aspects I don’t fully understand:
[0, 867.0, 477.0, 1, 1]
. I understand the first value is the keypoint index, and the second and third are the keypoint coordinates. However, what is the purpose of the two 1s at the end? I haven’t found any examples where these values differ from 1. Is this related to the network architecture or something else?freqs
in MergedMultipleHumanDataset
: In the process_training()
method, there’s an expression freqs = [0 for _ in range(16)]
. When I include more than 16 .json
files for training, this causes an index error in freqs[len(views_to_add)] += 1
. Could you clarify the role of the freqs
variable? Is it possible to adjust its length? For instance, if I want to include more than 16 .json
files for training (e.g., lightweight files of 1–5 MB but many of them), can this list be extended without any potential issues?.json
files representing sequences from five different individuals, the skeleton matching network will learn to distinguish between five people? What about I want to distinguish between e.g. 20 people? This question ties into my third question regarding the fixed freqs
list.Questions 3. and 4. are the most important for me. Thank you for your time and support!
Hello @ljmanso, @vangiel, and @pilarbachiller,
First of all, thank you for your work! I'm trying to test this approach on the AIST dataset, but I've encountered some issues while using
train_skeleton_matching.py
.Here’s the situation: I prepared a custom dataset by parsing a subset of AIST, following a similar structure to what you used with ARP LAB. Since AIST provides several calibrated cameras, I selected six of them, mirroring your setup with ARP LAB. I then created a pickle file containing a
TransformManager
instance to handle the necessary transformations and added a new configuration toparameters.py
. After that, I rantrain_skeleton_matching.py
, and the training process started successfully (so far, so good).However, the network doesn’t seem to learn anything. The loss function remains constant across epochs, and training halts due to early stopping. This behavior seems odd since the network's output should evolve over time, even if we would put there something that wouldn't make sense as an input. I have gone through the learning loop and noticed that
outputs = torch.squeeze(model(feats.float(), subgraph))
variable is always a vector of ones, so this explains why loss doesn't change.I cross-verified this using ARP LAB, and everything worked as expected - the loss decreased with each epoch. Could you help me identify what might be going wrong? To provide more context, I’ve attached an example of
.json
files along with the configuration I’m using. Each.json
includes a list of dictionaries with one person seen from six used cameras.Thank you for your help!
gLH_sBM_cAll_d16_mLH0_ch01.json