ScriptTiger / Hosts-BL

Simple tool to handle hosts file black lists that can remove comments, remove duplicates, compress to 9 domains per line, add IPv6 entries, as well as can convert black lists to multiple other black list formats compatible with other software.
https://scripttiger.github.io/more/
MIT License
53 stars 4 forks source link

Out of memory error when trying to compress a really huge hostfile #5

Closed Tendodeku closed 2 months ago

Tendodeku commented 2 months ago

Just like the title says. I am trying to compress a hostfile ~400mb and it fails with this error: runtime: out of memory: cannot allocate 281018368-byte block (1610448896 in use) fatal error: out of memory

goroutine 1 [running]: runtime.throw({0x2343d4, 0xd}) C:/Program Files/Go/src/runtime/panic.go:1047 +0x4d fp=0x984dc08 sp=0x984dbf4 pc=0x1c2d6d runtime.(*mcache).allocLarge(0x10e0088, 0x109c6000, 0x1) C:/Program Files/Go/src/runtime/mcache.go:236 +0x1be fp=0x984dc30 sp=0x984dc08 pc=0x1a08fe runtime.mallocgc(0x109c6000, 0x0, 0x0) C:/Program Files/Go/src/runtime/malloc.go:1053 +0x3eb fp=0x984dc68 sp=0x984dc30 pc=0x19a69b runtime.rawbyteslice(0x109c4800) C:/Program Files/Go/src/runtime/string.go:274 +0xdb fp=0x984dc84 sp=0x984dc68 pc=0x1d97db runtime.stringtoslicebyte(0x0, {0x5cbc4000, 0x109c4800}) C:/Program Files/Go/src/runtime/string.go:172 +0x4c fp=0x984dca4 sp=0x984dc84 pc=0x1d950c main.main() C:/Users/CJWally/Desktop/projects/GitHub/Hosts-BL/hosts-bl.go:201 +0xd8b fp=0x984dfc4 sp=0x984dca4 pc=0x21e3eb runtime.main() C:/Program Files/Go/src/runtime/proc.go:250 +0x22e fp=0x984dff0 sp=0x984dfc4 pc=0x1c552e runtime.goexit() C:/Program Files/Go/src/runtime/asm_386.s:1326 +0x1 fp=0x984dff4 sp=0x984dff0 pc=0x1eb321

goroutine 2 [force gc (idle)]: runtime.gopark(0x23c288, 0x2c6010, 0x11, 0x14, 0x1) C:/Program Files/Go/src/runtime/proc.go:381 +0xff fp=0x9847fdc sp=0x9847fc8 pc=0x1c595f runtime.goparkunlock(...) C:/Program Files/Go/src/runtime/proc.go:387 runtime.forcegchelper() C:/Program Files/Go/src/runtime/proc.go:305 +0xcf fp=0x9847ff0 sp=0x9847fdc pc=0x1c578f runtime.goexit() C:/Program Files/Go/src/runtime/asm_386.s:1326 +0x1 fp=0x9847ff4 sp=0x9847ff0 pc=0x1eb321 created by runtime.init.5 C:/Program Files/Go/src/runtime/proc.go:293 +0x23

...

ScriptTiger commented 2 months ago

Try running with the -dupe argument and see if that works.

Tendodeku commented 2 months ago

Tried that argument and got the same error. I guess the file is too big

ScriptTiger commented 2 months ago

Hosts-BL is designed and optimized to manipulate smaller hosts files extremely quickly, specifically the Steven Black hosts files. I'll leave this issue open though and add some additional options over time to allow for less memory-intensive operations.

For now, there are a couple of other things you can try. You could trying using the 64-bit Linux version of Hosts-BL in a Google Colab and see if that works. You could also check out the Compressed.cmd script in my Hosts-Conversions repo (https://github.com/ScriptTiger/Hosts-Conversions) and see if that works. This script will run considerably slower, but it's a trade-off for being far less memory-intensive by reading from and writing to your hard drive for everything rather than just slurping everything into memory, as Hosts-BL does.

ScriptTiger commented 2 months ago

@Tendodeku Try the latest release and see if that fixes the issue.

Tendodeku commented 2 months ago

Sounds good. Thanks for the update I will try it when I get home

Tendodeku commented 2 months ago

Tried the new version and the destination file it creates is empty, 0 bytes.

ScriptTiger commented 2 months ago

Have you confirmed you have the appropriate blackhole address configured? The default blackhole address it looks for is 0.0.0.0, but some hosts files use 127.0.0.1, or possibly even something else. Use the -from_blackhole argument to set the blackhole which is appropriate for your hosts file.

I've also just published a new release that doesn't sound like it's related to your issue, but you may want to use the latest release just in case to make sure you're working with the latest one.

Tendodeku commented 2 months ago

You were right. The source file was using 127.0.0.1. The new version worked wonderfully. It successfully compressed it. Thank you so much

ScriptTiger commented 2 months ago

And thank you, too, for your feedback!

ScriptTiger commented 2 months ago

@Tendodeku I just added a new feature to automatically detect what black hole address is being used. However, using the -from_blackhole argument would still be slightly faster since it can skip that automatic detection process.

I'm also curious if you've had a chance to try out the new -hash configurations with your larger files? Collisions may or may not be a realistic concern, depending on the size of your hosts file. But just curious if you've tried different hash sizes and noticed any difference in the results, other than it just running more slowly?