ericleasemorgan / reader

Distant Reader, a tool for using & understanding a corpus
GNU General Public License v2.0
20 stars 7 forks source link

Make carrel scraper more efficient #179

Closed dbrower closed 3 years ago

dbrower commented 3 years ago

The carrel scrapers, especially for the public carrels, get a lot of sqlite errors that the database is locked. This might be pointing to a limitation of using sqlite. But before calling it done, try to optimize the scraping to only save everything into sqlite once at the very end instead of doing it once for each carrel.

ericleasemorgan commented 3 years ago

The carrel scrapers, especially for the public carrels, get a lot of sqlite errors that the database is locked.

??? Don, let's discuss. --Eric

dbrower commented 3 years ago

Implemented batching of updates. Because of a sqlite limitation on "INSERT OR REPLACE" each batch can have no more than 500 rows.