Reproduce Results for Layout Transformer

Layout-Generation / layout-generation

Layout Generation and Baseline implementations

MIT License

140 stars 23 forks source link

Reproduce Results for Layout Transformer #3

Closed shanyang-me closed 3 years ago

shanyang-me commented 3 years ago

Hi! I was following you code to reproduce the results for Layout Transformer. To get the results you show on the website, did you use the same setup as shown in the notebook? like batch_size = 1 and trained only with 10k examples for PubLayNet dataset?

yashjain7856 commented 3 years ago

The values in the notebook are ideal values for obtaining decent results on the PubLayNet dataset. The training dataset can be increased to obtain smaller loss (if your device permits).

To obtain results shown on the website, we trained it on 25,000 examples of the PubLayNet dataset. The number of epochs was 372. The Batch size was 1.

23yashm commented 3 years ago

Below is the detailed prediction plot for the trained model. The batch size is kept to be 1.

publay_1

shanyang-me commented 3 years ago

I see! Thanks for the prompted reply! Somehow my loss remains at ~6.4. I think I did something stupid with preprocessing the data. Do you mind sharing your code for preprocessing the PubLayNet dataset? Screen Shot 2021-10-11 at 8 27 50 AM

shanyang-me commented 3 years ago

And the results look like this: Screen Shot 2021-10-11 at 9 00 34 AM

Looks ok-ish, but I still wonder why my loss is scale-wise higher.

shanyang-me commented 3 years ago

Actually, the softmax layer turns the 136 dim vector into a vector sums up to one, but the groundtruth sum is way larger than one, so I think it makes sense for the loss to be way larger than 1.

I wonder how did you make the loss decrease bellow 1?

shanyang-me commented 3 years ago

After applying softmax over each "dim" (category, x, y, w, h), my KLDivergence loss now is less than 1

23yashm commented 3 years ago

Actually, the softmax layer turns the 136 dim vector into a vector sums up to one, but the groundtruth sum is way larger than one, so I think it makes sense for the loss to be way larger than 1.

I agree with your observation.

While we were working on the project, we had a discussion over the loss function. The formula for the KL Loss (mentioned in Tensorflow Doc) is loss = y_true * log(y_true / y_pred). Whereas in the example following it, y_true = [[0, 1], [0, 0]] y_pred = [[0.6, 0.4], [0.4, 0.6]] the answer mentioned is 0.458. If we calculate using the formula mentioned, the answer will come out to be different. So we thought that the loss function is working in some different way. Maybe that is the reason for the loss lesser than 1.

shanyang-me commented 3 years ago

Maybe the implementation has changed. Anyway, I was able to reproduce the results after applying softmax for each class (c, x, y, h, w). And btw, I think for transformer, adding the mask is good, Z=tf.matmul(tf.transpose(key,perm=[0,1,3,2]),que)*(1/np.sqrt(self.model_dim)) W=tf.multiply(Z,self.mask_0) (not needed) W=tf.add(W,self.mask_inf) W=tf.keras.activations.softmax(W,axis=1) W=tf.multiply(W,self.mask_0) (not needed) W=tf.matmul(val,W)

mln00b commented 2 years ago

Hi, I'm trying to reproduce the LayoutTransformer results, just ran the notebook given, on the PubLayNet dataset, from the link given here

However, my plot looks like this. What could be going wrong here? @yashjain7856

publay%2B99

mln00b commented 2 years ago

Could you share the rico & publaynet data file used for LayoutTransformer? @yashjain7856 @tushar-jain01

yashjain7856 commented 2 years ago

@mln00b Most probably the issue lies in your publaynet dataset as your ground truth is not satisfactory. Refer to this comment for our processed PublayNet dataset : Link