Open elrayle opened 3 years ago
How are they created?
See also: Issue #408 Searches for deleted exhibits are still in the searches table
2021-02-02 - Manually deleted all searches older than 30 days using...
delete from searches where created_at < now() - interval 30 DAY;
BEFORE Delete | AFTER Delete |
---|---|
2.08M | 0.18M |
How are they created?
Count number of guest users:
select count(guest) from users where guest=1;
Alternate:
select count(id) from users where email like 'guest%';
NOTE: The count should be the same for both approaches.
2021-02-02 - Manually deleted all guests using...
delete from users where guest=1;
BEFORE Delete | AFTER Delete |
---|---|
0.12M | 128kB |
How are they created?
NEEDS FURTHER INVESTIGATION
Need to set up nightly crons in production to:
@gdelisle Questions:
The AWS prescription for this involves setting up a Lambda job to do the task. Here is a sort of best practices guide that they have written on the subject: https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/schedule-jobs-for-amazon-rds-and-aurora-postgresql-using-lambda-and-secrets-manager.html
Completed as part of https://culibrary.atlassian.net/browse/LP-222
Wait! Can we talk about this a little bit more before it's gone? There is a Lambda job currently running for each tier (three jobs in total) to clear out the database every Wednesday morning. I'll paste the code here, but I'd like to confirm that this new function has replaced the Lambda jobs, and if that's true I can delete the Lambda jobs.
import sys
import logging
import pymysql
import os
#rds settings
rds_host = os.environ.get('db_host')
name = os.environ.get('db_username')
password = os.environ.get('db_password')
db_name = os.environ.get('db_name')
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
try:
conn = pymysql.connect(host=rds_host, user=name, passwd=password, db=db_name, connect_timeout=5)
except pymysql.MySQLError as e:
logger.error("ERROR: Unexpected error: Could not connect to MySQL instance.")
logger.error(e)
sys.exit()
logger.info("SUCCESS: Connection to RDS MySQL instance succeeded")
"""
This function fetches content from MySQL RDS instance
"""
item_count = 0
with conn.cursor() as cur:
cur.execute('delete from searches where created_at < now() - interval 30 DAY;')
cur.execute('delete from users where guest=1 and created_at < now() - interval 2 DAY;')
conn.commit()
return "Maintenance done"
logger.info("SUCCESS: Maintenance done")
@gdelisle Oh! I didn't know this was already done as lambdas. That makes sense why I wasn't seeing any guest users in the db then. Yes, feel free to get rid of the lambdas. I'm running the cleanup as rake tasks provided by blacklight, which are running rails ActiveRecord deletes behind the scenes. The rake tasks are going to better be able to handle cleaning up any associations and etc, vs raw sql queries.
I didn't add a cron job for cleaning up the guest users because I saw that there weren't any and just assumed something had changed about how blacklight runs since this ticket was created. So I'll create a separate Jira ticket for handling just that (https://culibrary.atlassian.net/browse/LP-1087). Since we have another spotlight sprint in 2 weeks, it's still okay to delete the lambda job now and that piece of it will get replaced soon.
Honestly I almost forgot they were there, until I was reading the comments from 2021 about possibly making Lambdas, and then it jogged my memory because I was the one that wrote them so :B
On Sep 3, 2024, at 12:40 PM, Christina Cortland @.***> wrote:
@gdelislehttps://github.com/gdelisle Oh! I didn't know this was already done as lambdas. That makes sense why I wasn't seeing any guest users in the db then. Yes, feel free to get rid of the lambdas. I'm running the cleanup as rake tasks provided by blacklight, which are running rails ActiveRecord deletes behind the scenes. The rake tasks are going to better be able to handle cleaning up any associations and etc, vs raw sql queries.
I didn't add a cron job for cleaning up the guest users because I saw that there weren't any and just assumed something had changed about how blacklight runs since this ticket was created. So I'll create a separate Jira ticket for handling just that. Since we have another spotlight sprint in 2 weeks, it's still okay to delete the lambda job now and that piece of it will get replaced soon.
— Reply to this email directly, view it on GitHubhttps://github.com/cul-it/exhibits-library-cornell-edu/issues/409#issuecomment-2326978299, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AIROZU5RD7NBWJL4LCH2TYDZUXRBBAVCNFSM6AAAAABNSD3KOWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRWHE3TQMRZHE. You are receiving this because you were mentioned.Message ID: @.***>
I have edited the Lambda jobs to "set concurrency to zero" which is supposedly the way to prevent them from executing. Since they're scheduled to execute on Wednesday we should see the results of what would happen when these Lambdas are no longer there.
Description
Production database is > 2.5GB. Most of that is in the
searches
table which is > 2GB.To get the size of the the entire database:
See the size of each table:
_Replacing
_DATABASE_NAME_
with the actual database name._Number of rows for each table:
Notable sizes...