Open persona0591 opened 3 years ago
Whoah, that's a big file. The wal
file is a journaling file which SQLite uses for speed optimization. We will add an option to turn this off to this library shortly and we will also make it configurable from the Apify SDK. Should be done in two weeks I guess.
Nice issue btw, thanks. Reads well, has all the important info.
Hi Ondra, thank you! I'm looking forward to this option.
Hi Ondra, out of curiosity: I've noticed that the ApifyStorageLocal
class now has a new option, enableWalMode
, which seems to resolve my issue! 🥳
However, do you know how I can use this option from the Apify SDK? For example, can I set an environment variable for this (similar to APIFY_LOCAL_STORAGE_DIR
)?
Hey @persona0591, we'll have a PR that will allow you to configure this option in SDK soon.
Thanks for the update, Ondra!
Hi guys,
Thanks for your awesome framework! 🥇
I have a question: I'm using the PuppeteerCrawler with local storage. However, I noticed that the SQLite files associated with the storage grow rapidly in size, notably the
db.sqlite-wal
file. Take the snippet underneath for example (no production code, just for illustration):The code outputs the SQLite request queue files and their sizes. If I run this code it crawls a number of pages, including pages with non-exsiting or erroneous domains, in order to force the crawler to retry these. The output (on my machine):
The
db.sqlite-wal
file is huge (for just a couple of crawls). Unfortunately, I'm executing my crawler on a low-storage environment - and running out of disk space.Is this something that can be solved? For example, would it be possible to use an in-memory database? Or would it be possible to not create this
db.sqlite-wal
file (or to have an option to not create it)?Many thanks!