Closed shanyang-me closed 3 years ago
The values in the notebook are ideal values for obtaining decent results on the PubLayNet dataset. The training dataset can be increased to obtain smaller loss (if your device permits).
To obtain results shown on the website, we trained it on 25,000 examples of the PubLayNet dataset. The number of epochs was 372. The Batch size was 1.
Below is the detailed prediction plot for the trained model. The batch size is kept to be 1.
I see! Thanks for the prompted reply! Somehow my loss remains at ~6.4. I think I did something stupid with preprocessing the data. Do you mind sharing your code for preprocessing the PubLayNet dataset?
And the results look like this:
Looks ok-ish, but I still wonder why my loss is scale-wise higher.
Actually, the softmax layer turns the 136 dim vector into a vector sums up to one, but the groundtruth sum is way larger than one, so I think it makes sense for the loss to be way larger than 1.
I wonder how did you make the loss decrease bellow 1?
After applying softmax over each "dim" (category, x, y, w, h), my KLDivergence loss now is less than 1
Actually, the softmax layer turns the 136 dim vector into a vector sums up to one, but the groundtruth sum is way larger than one, so I think it makes sense for the loss to be way larger than 1.
I agree with your observation.
While we were working on the project, we had a discussion over the loss function. The formula for the KL Loss (mentioned in Tensorflow Doc) is loss = y_true * log(y_true / y_pred)
.
Whereas in the example following it,
y_true = [[0, 1], [0, 0]]
y_pred = [[0.6, 0.4], [0.4, 0.6]]
the answer mentioned is 0.458
. If we calculate using the formula mentioned, the answer will come out to be different. So we thought that the loss function is working in some different way. Maybe that is the reason for the loss lesser than 1.
Maybe the implementation has changed. Anyway, I was able to reproduce the results after applying softmax for each class (c, x, y, h, w). And btw, I think for transformer, adding the mask is good, Z=tf.matmul(tf.transpose(key,perm=[0,1,3,2]),que)*(1/np.sqrt(self.model_dim)) W=tf.multiply(Z,self.mask_0) (not needed) W=tf.add(W,self.mask_inf) W=tf.keras.activations.softmax(W,axis=1) W=tf.multiply(W,self.mask_0) (not needed) W=tf.matmul(val,W)
Hi, I'm trying to reproduce the LayoutTransformer results, just ran the notebook given, on the PubLayNet dataset, from the link given here
However, my plot looks like this. What could be going wrong here? @yashjain7856
Could you share the rico & publaynet data file used for LayoutTransformer? @yashjain7856 @tushar-jain01
@mln00b Most probably the issue lies in your publaynet dataset as your ground truth is not satisfactory. Refer to this comment for our processed PublayNet dataset : Link
Hi! I was following you code to reproduce the results for Layout Transformer. To get the results you show on the website, did you use the same setup as shown in the notebook? like batch_size = 1 and trained only with 10k examples for PubLayNet dataset?