ghukill / atm

Article Topic Modeling
0 stars 0 forks source link

Save raw text of articles from Article model #1

Open ghukill opened 7 years ago

ghukill commented 7 years ago

After downloading and initiatilizing with Article model, save raw text as this is a time bottleneck

ghukill commented 7 years ago

If so, save original PDFs in orig folder, and save raw text when created, raw_tokens. More folder structure.

ghukill commented 7 years ago

As such, extract raw text during download, when building model check for raw tokens