IBM / document2slides

This repository contains the code to reconstruct the training dataset from NLP/ML Papers in PDF format together with their corresponding slides.
Apache License 2.0
52 stars 10 forks source link

Train the model on a new dataset #7

Open MorenoLaQuatra opened 2 years ago

MorenoLaQuatra commented 2 years ago

Hi,

First of all, thank you for your work and for the repo! I have a question related to the use of the repository on a new data collection. Which are the steps required to train and evaluate on a new dataset? Which is the format required for the training data?