CorentinB / warc

Read and write WARC files in Go
Creative Commons Zero v1.0 Universal
16 stars 4 forks source link

Panic on too many open files error, should retry instead #43

Open willmhowes opened 3 months ago

willmhowes commented 3 months ago

Received the following error more than once:

panic: open jobs/warcs/TCPK-20240826191443122-00001-crawl918.us.archive.org.warc.gz.open: too many open files

goroutine 119 [running]:
github.com/CorentinB/warc.isFileSizeExceeded({0xc8816c7f40?, 0xc00041e3f0?}, 0x408f400000000000)
        /var/www/go/pkg/mod/github.com/!corentin!b/warc@v0.8.43/utils.go:240 +0xf5
github.com/CorentinB/warc.recordWriter(0xc00049e0f0, 0xc00014c070, 0xc0001b48c0, 0xc00041e3c0)
        /var/www/go/pkg/mod/github.com/!corentin!b/warc@v0.8.43/warc.go:137 +0x45a
created by github.com/CorentinB/warc.(*RotatorSettings).NewWARCRotator in goroutine 1
        /var/www/go/pkg/mod/github.com/!corentin!b/warc@v0.8.43/warc.go:70 +0x8a

TODO: as stated in issue title, library should retry instead of panicking (according to @CorentinB)

willmhowes commented 3 months ago

Here is the output of ulimit -a on the machine receiving the error:

➜  ulimit -a
-t: cpu time (seconds)              unlimited
-f: file size (blocks)              unlimited
-d: data seg size (kbytes)          unlimited
-s: stack size (kbytes)             8192
-c: core file size (blocks)         0
-m: resident set size (kbytes)      unlimited
-u: processes                       430368
-n: file descriptors                65536
-l: locked-in-memory size (kbytes)  13778120
-v: address space (kbytes)          unlimited
-x: file locks                      unlimited
-i: pending signals                 430368
-q: bytes in POSIX msg queues       819200
-e: max nice                        0
-r: max rt priority                 0
-N 15:                              unlimited
willmhowes commented 3 months ago

Note: I was running 3 Zeno crawls on the same HDD-based machine (meaning the I/O is slow) and with each Zeno instance configured with 125 workers. The solution may just be to run with less workers (maybe 8?)