Open ghukill opened 7 years ago
After downloading and initiatilizing with Article model, save raw text as this is a time bottleneck
Article
If so, save original PDFs in orig folder, and save raw text when created, raw_tokens. More folder structure.
orig
raw_tokens
As such, extract raw text during download, when building model check for raw tokens
After downloading and initiatilizing with
Article
model, save raw text as this is a time bottleneck