Open unisqu opened 7 years ago
This is normal disk activity. The privacore fork of this engine uses a different method of merging data which reduces IOPs, but it's not going to be much less. A search engine is basically a glorified map-reduce task, and these need high IOPs.
Your best bet is to get a few enterprise SAS SSDs that don't do excessive garbage collection. You can either rent such a server or buy it. The minimum would be a few SATA SSDs. The best would be a RAID of NVMe PCIe SSDs (like these servers). However it would be better for scalability to use GCE, AWS or linode/SSDnode virtual servers.
how can i reduce iops on this? what's a good way to ensure iops is reduced? i've cut down the number of crawlers to 50. how do i calculate iops usage? what is actually happening...
You can't reduce the iops without rewriting code that uses the disk. The admin panel should give you and extensive overview of the timing of each operation (write, fetch, merge). This allows you to optimize your settings, but isn't a cure-all or even guaranteed to have any effect whatsoever. Checkout the privacore branch for an example of how to rewrite for lower iops. You could also inquire about the Pro version of Gigablast.
I've read privacore, that's not very elegant in terms of merging. it's too complicated.
where can i find out about the pro version of gigablast?
The way my current disks are being hammered, how long does the disks last running on gigablast opensource?
I'm almost hitting 1 mil record items in a day now. 1 mil a day is very slow but makes me wonder how many crawlers google are using daily.
750k search results in 16 hours but disk activity is too high + ~ 50GB total disk usage on 5 instances on same computer with the instances distributed in 2 HDDs...
is this normal? the disk activity is too high. I do have 24GB Ram for use though on Ubuntu desktop.
The disks are always making the CPU load ~ around 50. I have 4 cores CPU running here.