QUT-Digital-Observatory / coordination-network-toolkit

A small command line tool and set of functions for studying coordination networks in Twitter and other social media data.
MIT License
72 stars 14 forks source link

disk full error #57

Open dandaii opened 1 year ago

dandaii commented 1 year ago

Hi there,

I have a large size of social media dataset (~44GB after preprocessed into sqlite db). When I ran the package in my terminal, I constantly encountered the error message "sqlite3.OperationalError: database or disk is full". I assume it's because there are large size of temporary files generated in the background, which takes all of my available RAMs. Any thoughts to solve this issue? I'm using a vm with 128GB RAM, with sufficient storage size in the disk.

Here's the complete error message I had: " compute_networks weibocov2_20230603_file.db compute co_retweet --time_window 60 Calculating a co_retweet network on weibocov2_20230603_file.db with the following settings: time_window: 60 seconds min_edge_weight: 2 co-occurring messages n_cpus: 32 processors output_file: None Ensure the indexes exist to drive the join. Calculating the co-retweet network concurrent.futures.process._RemoteTraceback: """ Traceback (most recent call last): File "/usr/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/compute_networks.py", line 748, in _run_query db.execute( sqlite3.OperationalError: database or disk is full """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/ubuntu/.local/bin/compute_networks", line 8, in sys.exit(main()) File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/main.py", line 281, in main compute_co_retweet_parallel( File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/compute_networks.py", line 703, in compute_co_retweet_parallel return parallise_query_by_user_id( File "/home/ubuntu/.local/lib/python3.8/site-packages/coordination_network_toolkit/compute_networks.py", line 147, in parallise_query_by_user_id d.result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result return self.get_result() File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in get_result raise self._exception sqlite3.OperationalError: database or disk is full "

Here's the output of RAM usage of the vm when having the above error: " free -h total used free shared buff/cache available Mem: 125Gi 1.7Gi 11Gi 0.0Ki 112Gi 122Gi Swap: 92Mi 5.0Mi 87Mi "

Thanks. Dan (HDR from DMRC)

deimosnz commented 1 year ago

Hi Dan,

This sounds a bit like a VM issue to me. Did you go to QUT hacky hour the other day?

Rob

dandaii commented 1 year ago

Hi Rob,

Thanks for your reply. Yes I did. The interesting thing is yesterday before I joined the hacky hour, the code went through without any errors (even failed many times before that...). So I succeeded in generating a co-tweet network and exporting the graphml file, but when I tried the co-retweet, it didn't work anymore. It was quite strange. I tried to clear the caches before I ran the code as well, but still got the error again.

I'm using Nectar cloud as the VM, which only has a 30GB system disk and can't be extended. However, I attached a 800GB volume to it, and ran the code from the mounted volume (that's how the co-tweet network was generated). I'm still not sure what caused the error.

Thanks, Dan

On Thu, Jun 8, 2023 at 1:07 PM deimosnz @.***> wrote:

Hi Dan,

This sounds a bit like a VM issue to me. Did you go to QUT hacky hour the other day?

Rob

— Reply to this email directly, view it on GitHub https://github.com/QUT-Digital-Observatory/coordination-network-toolkit/issues/57#issuecomment-1581823429, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYMPBF3KUT5OQDUKDRZK6VTXKE6YNANCNFSM6AAAAAAY5QSHBA . You are receiving this because you authored the thread.Message ID: <QUT-Digital-Observatory/coordination-network-toolkit/issues/57/1581823429 @github.com>

deimosnz commented 1 year ago

Hi Dan,

Nectar Cloud is a bit weird at times and doesn't always act as expected (from my experience at least). You can try sending a ticket to Nectar support asking if anything at their end would make it weird. But I'll give it a look and see what's up as well. However, SQLite error messages are pretty literal and if it says the disk is full then it did indeed see that the disk was full for some reason or other.

I'm usually on campus at KG on a Tuesday in Z6 level 5 hot desks, if your inclined to drop by

Rob

On Thu, 8 Jun 2023 at 15:06, dandaii @.***> wrote:

Hi Rob,

Thanks for your reply. Yes I did. The interesting thing is yesterday before I joined the hacky hour, the code went through without any errors (even failed many times before that...). So I succeeded in generating a co-tweet network and exporting the graphml file, but when I tried the co-retweet, it didn't work anymore. It was quite strange. I tried to clear the caches before I ran the code as well, but still got the error again.

I'm using Nectar cloud as the VM, which only has a 30GB system disk and can't be extended. However, I attached a 800GB volume to it, and ran the code from the mounted volume (that's how the co-tweet network was generated). I'm still not sure what caused the error.

Thanks, Dan

On Thu, Jun 8, 2023 at 1:07 PM deimosnz @.***> wrote:

Hi Dan,

This sounds a bit like a VM issue to me. Did you go to QUT hacky hour the other day?

Rob

— Reply to this email directly, view it on GitHub < https://github.com/QUT-Digital-Observatory/coordination-network-toolkit/issues/57#issuecomment-1581823429 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AYMPBF3KUT5OQDUKDRZK6VTXKE6YNANCNFSM6AAAAAAY5QSHBA

. You are receiving this because you authored the thread.Message ID:

<QUT-Digital-Observatory/coordination-network-toolkit/issues/57/1581823429 @github.com>

— Reply to this email directly, view it on GitHub https://github.com/QUT-Digital-Observatory/coordination-network-toolkit/issues/57#issuecomment-1581897228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH4MBI7S4Q235ULX7C74X53XKFMU3ANCNFSM6AAAAAAY5QSHBA . You are receiving this because you commented.Message ID: <QUT-Digital-Observatory/coordination-network-toolkit/issues/57/1581897228 @github.com>

dandaii commented 1 year ago

Thanks Rob, I will drop by at Z6 next Tuesday if I can't have this fixed then.

Cheers, Dan

On Thu, Jun 8, 2023 at 3:41 PM deimosnz @.***> wrote:

Hi Dan,

Nectar Cloud is a bit weird at times and doesn't always act as expected (from my experience at least). You can try sending a ticket to Nectar support asking if anything at their end would make it weird. But I'll give it a look and see what's up as well. However, SQLite error messages are pretty literal and if it says the disk is full then it did indeed see that the disk was full for some reason or other.

I'm usually on campus at KG on a Tuesday in Z6 level 5 hot desks, if your inclined to drop by

Rob

On Thu, 8 Jun 2023 at 15:06, dandaii @.***> wrote:

Hi Rob,

Thanks for your reply. Yes I did. The interesting thing is yesterday before I joined the hacky hour, the code went through without any errors (even failed many times before that...). So I succeeded in generating a co-tweet network and exporting the graphml file, but when I tried the co-retweet, it didn't work anymore. It was quite strange. I tried to clear the caches before I ran the code as well, but still got the error again.

I'm using Nectar cloud as the VM, which only has a 30GB system disk and can't be extended. However, I attached a 800GB volume to it, and ran the code from the mounted volume (that's how the co-tweet network was generated). I'm still not sure what caused the error.

Thanks, Dan

On Thu, Jun 8, 2023 at 1:07 PM deimosnz @.***> wrote:

Hi Dan,

This sounds a bit like a VM issue to me. Did you go to QUT hacky hour the other day?

Rob

— Reply to this email directly, view it on GitHub <

https://github.com/QUT-Digital-Observatory/coordination-network-toolkit/issues/57#issuecomment-1581823429

, or unsubscribe <

https://github.com/notifications/unsubscribe-auth/AYMPBF3KUT5OQDUKDRZK6VTXKE6YNANCNFSM6AAAAAAY5QSHBA

. You are receiving this because you authored the thread.Message ID:

<QUT-Digital-Observatory/coordination-network-toolkit/issues/57/1581823429

@github.com>

— Reply to this email directly, view it on GitHub < https://github.com/QUT-Digital-Observatory/coordination-network-toolkit/issues/57#issuecomment-1581897228 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AH4MBI7S4Q235ULX7C74X53XKFMU3ANCNFSM6AAAAAAY5QSHBA

. You are receiving this because you commented.Message ID:

<QUT-Digital-Observatory/coordination-network-toolkit/issues/57/1581897228 @github.com>

— Reply to this email directly, view it on GitHub https://github.com/QUT-Digital-Observatory/coordination-network-toolkit/issues/57#issuecomment-1581923404, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYMPBF6MI2RRNMY4DHBK2WTXKFQXFANCNFSM6AAAAAAY5QSHBA . You are receiving this because you authored the thread.Message ID: <QUT-Digital-Observatory/coordination-network-toolkit/issues/57/1581923404 @github.com>