MurtuzaBohra / SimpDOM

Simplified DOM Trees for Transferable Attribute Extraction from the Web
37 stars 7 forks source link

Training a model #2

Closed MichaelAzmy closed 2 years ago

MichaelAzmy commented 2 years ago

First, I want to thank you for putting effort to implement the paper. The code is clean but I have some hard time understanding the steps needed to preprocess data for training any new vertical and possibly new datasets.

I can see multiple scripts under DatasetCreation but I can't understand the steps nor the dependencies. Is there a single entry point for preprocessing the data? Can you point me to the steps?

Thanks.

MurtuzaBohra commented 2 years ago

The "train.ipynb" notebook contains the steps to process the new verticals. You need to unzip the SWDE data set in the "data" folder and give that path to "datapath" variable and give the vertical name of your choice. Example of the directory structure is shows in the below image.

directory_structure