Closed bornakke closed 9 years ago
The installation guide mentions the following: While capturing Twitter data from the streaming API does not need a lot of resources, the analysis of larger datasets (> 1 million tweets) can get slow. We strongly recommend using an SSD for database storage. Adding RAM and optimising the mysql configuration can also boost speed significantly. We recommend using the script available on http://mysqltuner.com to tweak your mysql config, and to set both the sort_buffer_size and myisam_sort_buffer_size as big as possible.
Let me know if that works for you, otherwise we can delve into it deeper. We currently process most analyses on 10+ million tweets within about 30 seconds to 1 minute.
Nice. Thank you I will look into mysql tuner asap.
Could you share your hardwear setup for such amazing times?
4 cores @ 3.10Ghz, 32GB RAM, 4TB SSD in Btrfs, latest version of Ubuntu, MySQL 5.6, PHP 5.5.
Hmm.. okay that is a little more than us, but not in the extremes. Would you maybe share your own my.cnf? I have followed mysqltuner, but without receiving any radical imporvments.
When using assocional profiler we had for some time experienced errors like this: Warning: mysql_fetch_assoc() expects parameter 1 to be resource, boolean given in dmi-tcat/analysis/mod.hashtag_variability.php on line 190.
However after increasing a number of limits in my.cnf (see below) it now runs smoothly and without too many hickups :) However don't know if it is a bad solution, since mysqltuner keeps telling me that system stability is in danger:
key_buffer_size = 2G
tmp_table_size = 2G
max_heap_table_size = 2G
myisam_sort_buffer_size = 4G
sort_buffer_size = 4G
query_cache_limit = 64M
query_cache_size = 1024M
We have various machines, with the following value ranges
key_buffer_size = 8G # up till 13G
key_buffer = 1G # up till 8G
tmp_table_size = 1G # up till 3G
max_heap_table_size = 1G # up till 3G
myisam_sort_buffer_size = 1G
sort_buffer_size = 128MB
query_cache_limit = 1G
query_cache_size = 6G
Note that mysqltuner will optimize for multiple simultaneous connections and that in most cases TCAT will not be run on a production server with many connections. Putting some of the values way too high for mysqltuner's liking does improve TCAT performance (as more can be done in memory) but limits the number of people who can do big analysis at the same time. The large query_cache_limit and query_cache_size, however, will ensure that people who do the same analysis (or even loading the front-page) in a short time-span, will have even less waiting time.
Also, make sure that your mysql tmp dir is on an SSD.
Thank you. It seems that we have reached somewhat the same configuration. I just hope it doesn't kill the other services running on the server.
Hi TCAT,
We are at Copenhagen University for the first time attempting to analyze a bigger dataset in TCAT (0.5 mill tweets). This has made it clear that our current setup is far from strong enough to handle the load. However looking into memory and CPU usage reveals that it is actually not here our limitation are. We are running on SDD so i'm thinking that our problems might be due to lacking mysql optimizations.
Am I correct if I say that there use to be a suggested my.ini file somewhere on github? If yes is this one still avaiable somewhere? If no, do you have any suggestions that we might try?
Best Tobias