Layout-Generation / layout-generation

Layout Generation and Baseline implementations
MIT License
140 stars 23 forks source link

What is the format for the numpy file when loading data in LayoutVAE.py #4

Closed sbunian closed 2 years ago

sbunian commented 2 years ago

Hi,

I am trying to run the main.py file inside LayoutVAE/Source folder. There is a load_data method inside layoutvae.py module. However, there is no information on the format of this file.

Can you please share some information regarding what format does this .npy file need to be in.

Thanks Sara

tushar-jain01 commented 2 years ago

Hello Sara,

The npy file that contains data has Shape (N,9,10) Where N is the number of Layouts. For each layout, we have a matrix of size (9,10) each row of this matrix corresponds to a particular box in the layout. Out of the ten entries in each row, the first four correspond to the normalized (x,y,w,h). (x,y) is an upper left corner (w,h) are width and height. The following six entries are a one-hot vector indicating the class of that box. Out of these six classes, class 0 means that there Is no box. If a Layout contains seven boxes instead of 9, then the last two boxes of the layout will correspond to class 0, and there (x,y,w,h) will be (0,0,0,0).

Here is the link to npy file Data

Thanks, Tushar Jain

sbunian commented 2 years ago

Hi Tushar,

Thanks for the explanation. We downloaded the npy file you provided and were able to run through the code.

Would you be happy to please provide us the actual Python code used to generate this numpy file please?

Thanks Sara

sukritiverma1996 commented 2 years ago

Hey Tushar, The paper says that number of elements goes upto 330k samples having a maximum of 128 elements for PubLayNet. But the notebook has just 10k samples and length of 9. How can we adjust the code for 128 samples? pad with zeros?

Thanks! Sukriti

tushar-jain01 commented 2 years ago

Hi Sukriti, Padding with zero is a simpler option but it will have larger computational requirements so instead of using zeros I will suggest you to use loops this may take more time but you will not run out of memory. But you can use padding with zero if you don't have issue with memory.