HelloZeroNet / ZeroNet

ZeroNet - Decentralized websites using Bitcoin crypto and BitTorrent network
https://zeronet.io
Other
18.34k stars 2.27k forks source link

Optimize the function for building database #1612

Closed blurHY closed 2 years ago

blurHY commented 6 years ago

rev3594 It takes about 5min to build the database for Horizon.And during building,i can't do anything on zeronet.it just says loading.

HelloZeroNet commented 6 years ago

For me: INFO Site:1CjMsv..uxBn Imported 10 data file in 11.4669420719s on a 20 USD/yr VPN which is I think pretty OK since it reads and inserts around 70MB of data

I will check the possibility to moving db operations to separate thread with moving to python3

blurHY commented 6 years ago

70MB? It should be 216MB. I'm running zeronet on HDD not SSD,so it will be slower.Maybe some cache in RAM is needed. Also the data is growing,as i said.

HelloZeroNet commented 6 years ago
$ gzip -l *.json.gz
         compressed        uncompressed  ratio uncompressed_name
            3303011            27374599  87.9% data_keywords.json
            2570148            21068213  87.8% data_main.json
            1006795             4339919  76.8% data_phrases.json
            1460239            15571250  90.6% data_relationship.json
            8340193            68353981  87.8% (totals)

SSD highly recommended for ZeroNet

blurHY commented 6 years ago

Not everyone would use SSD.And the small file read/write is not HHD good at.So it needs a solution to bypass small file read/write.Also cache the db read/write to file system

HelloZeroNet commented 6 years ago

The file reads are cached and a write is handled by the operating system. The db cache is handled by the sqlite module.

slrslr commented 6 years ago

@HelloZeroNet i also have this issue, but 4 times slower in my case to load the .db (24 minutes) + CPU overload. As @blurHY says, not everyone will use HDD. And think about smartphone users.. I found this thread because i wanted to submit the issue about same thing. On mentioned Horizon site, it took my older Pentium computer 15 minutes of full CPU load (HDD activity was not exhausted whole time) to finish rechecking of Horizon site.

This is what i did on my Linux Ubuntu 16.04 computer with latest Zeronet: cd ~/Apps/ZeroBundle/ZeroNet/data/ find ./1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn -delete mkdir ./1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn git clone https://github.com/blurHY/Horizon.git mv Horizon/* ./1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn/ ../zeronet.py sitePublish 1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn

Then i go to ZeroHello and click "Check files" next to Horizon site. Result was like 15 minut CPU overload of the computer, debug.log not went crazy, but i seen in Horizon site (0) menu that the site has 300MB .db And this size is nothing special, Zeronet should be able to cope with dynamic sites having lets say 10GB databases etc.

cd ~/Apps/ZeroBundle/ZeroNet/data/1CjMsvhJ2JsV4B5qo3FDHnF3mvRCcHuxBn/data $ gzip -l *.json.gz

         compressed        uncompressed  ratio uncompressed_name
            2743133            23285428  88.2% data_keywords1.json
            2208428            23940669  90.8% data_keywords2.json
            2602574            23455548  88.9% data_keywords3.json
            2656768            23706139  88.8% data_keywords4.json
             902385             8252620  89.1% data_keywords5.json
            2789194            24995191  88.8% data_main1.json
            2155699            15099977  85.7% data_main2.json
            3347288            19561521  82.9% data_phrases1.json
            2197791            23694010  90.7% data_relationship1.json
            2425075            24690472  90.2% data_relationship2.json
            1197466            14682409  91.8% data_relationship3.json
             303130             1297035  76.6% data_zites1.json
           25528931           226661019  88.7% (totals)

I think this happened to me several times on this site, because in debug-last.log i see: Site:1CjMsv..uxBn Imported 12 data file in 1443.11739898s

It may be related to unsolved issue where ZeroMe db tooks days to rebuild: ZeroTalk topic, also described in this unsolved issue: https://github.com/HelloZeroNet/ZeroMe/issues/121

HelloZeroNet commented 6 years ago

The problem is it's limited by IO/Sqlite, so we can't do much about it. You can try experiment it by removing some indexes as that's one of the factors of insert performance.

blurHY commented 6 years ago

removing some indexes

Then it will be slower to query ?

HelloZeroNet commented 6 years ago

It's not necessary going to be slower. Worth experimenting with it.

blurHY commented 6 years ago

1min and 30 secs building after removed all of indexes.And the query seems quicker.Maybe the reason is that the cpu is idle .

skwerlman commented 6 years ago

what about making db writes async? that way it at least won't lock up the whole client

tangdou1 commented 6 years ago

Each time I add this zite ( Horizon) to my poor vps (only have 500MB memery), the zeronet.py program would be killed by the system due to out of memery.

blurHY commented 6 years ago

Is there any way to know that the database has been built? Also show progress bar when the db is building It won't show progress when the db is big and not filled with user data.Then users don't know what are zeronet doing