Closed gaurav closed 4 months ago
Notes:
(Jan 12, 2024 at 4:53pm)
nodenorm-2023nov5-id-categories/id-categories.rdb
rather than an obviously S3 URL.
Aha, my bucket wasn't publicly accessible. I added the following to the Bucket permissions in JSON:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::nodenorm-2023nov5-id-categories/id-categories.rdb"
}
}
]
}
Confirmed that the file was now downloadable. And started a new Serverless ElastiCache at 5:54pm.
(Jan 12, 2024 at 5:57pm)
ubuntu@ip-172-31-26-53:~$ redis-cli --tls -h test2-0001-001.test2.cq5uuk.use1.cache.amazonaws.com -p 6379 DBSIZE
(integer) 132881946
ubuntu@ip-172-31-26-53:~$ redis-cli --tls -h test2-0002-001.test2.cq5uuk.use1.cache.amazonaws.com -p 6379 DBSIZE
(integer) 131760211
So... it looks like it's working.
This is probably happening because of this issue: https://github.com/redis/redis/issues/6098
Options:
We can figure out the list of master clusters by running something like:
ubuntu@ip-172-31-26-53:~$ redis-cli --tls -h clustercfg.test2.cq5uuk.use1.cache.amazonaws.com -p 6379 CLUSTER NODES | grep master
7028fec7d3b97480e91d51f47a1cba1664999087 test2-0001-001.test2.cq5uuk.use1.cache.amazonaws.com:6379@1122 myself,master - 0 1705011272000 2 connected 0-8191
5cf01b74b4728a3865b04c9dce46b879c5400439 test2-0002-001.test2.cq5uuk.use1.cache.amazonaws.com:6379@1122 master - 0 1705011272000 1 connected 8192-16383
ubuntu@ip-172-31-26-53:~$ redis-cli --tls -h clustercfg.test2.cq5uuk.use1.cache.amazonaws.com -p 6379 CLUSTER NODES | grep master | cut -d' ' -f2 | cut -d':' -f1
test2-0001-001.test2.cq5uuk.use1.cache.amazonaws.com
test2-0002-001.test2.cq5uuk.use1.cache.amazonaws.com
I'm trying this out on EC2 by running:
$ rdb -c protocol id_to_type_db.rdb | redis-cli -c -h test2-0001-001.test2.cq5uuk.use1.cache.amazonaws.com -p 6379 --pipe --tls
$ rdb -c protocol id_to_type_db.rdb | redis-cli -c -h test2-0002-001.test2.cq5uuk.use1.cache.amazonaws.com -p 6379 --pipe --tls
So far so good. It is EXTRAORDINARILY SLOW (approaching 24 hours!) but that might just be because we have a tiny instance.
Huge success! ElastiCache was able to load all 434,397,820 keys into a Serverless ElastiCache Redis 7.1 database. The database was created at January 12, 2024, 17:52:47 (UTC-05:00) and backup completed at January 12, 2024, 19:06:52 (UTC-05:00), so 1h14m, approximately as long as expected.
I don't know if it makes financial sense to switch us over to a Serverless database instead of using the custom clusters, but I'm guessing... yes? Regardless, this restore should work for custom clusters too.
This issue now solved by the slow and inefficient method of uploading the data to all the nodes (i.e. all 400M+ entries are uploaded to node 1, then node 2, and so on). Our total load time is approximately 3 hours because we have to loop through all three databases.
We can probably improve this further by downloading the Hashmap (i.e. come up with a Python-based solution for https://github.com/redis/redis/issues/6098), but a better solution for our needs would be with rolling Serverless updates: https://github.com/TranslatorSRI/NodeNormalization/issues/252 -- I'll open tickets for that if ITRB doesn't want to do rolling Serverless updates, but not before.
This will need to be fixed in the Translator-Devops repo at https://github.com/helxplatform/translator-devops/issues/813