Closed felder closed 2 years ago
@yuvipanda Can you think of any reason why we'd need over a year's worth of logs on the hub pod? I'm thinking some sort of simple nightly job on the hub pod, perhaps logrotate? can make sure we keep the hub logs at an acceptable size. Perhaps monthly rotations with gzips going back 12 months?
Ryan on slack also wonders if perhaps we don't need a file at all instead relying on google logs. That does make a lot of sense.
I kept the logs primarily so we can answer questions about who is using the hub and for how much over the years. I agree we don't need to keep it there.
How about we:
What do you think of this plan, @felder?
I just created this sink: https://console.cloud.google.com/logs/router?project=ucb-datahub-2018. Need to test if it works though. I also want us to keep historical logs so we can do longer term analysis.
@yuvipanda sounds reasonable to me. Really I'm pretty open to any solutions that allow us to continue to be able to troubleshoot current issues and retain the historical data we require while solving the issue of filled filesystems and manual cleaning.
@felder do you think you can help me by copying the logs that exist in all the current home directories over to this google storage bucket? https://console.cloud.google.com/storage/browser/ucb-datahub-hub-logs;tab=objects?forceOnBucketsSortingFiltering=false&project=ucb-datahub-2018&prefix=&forceOnObjectsSortingFiltering=false. I think your internet speeds are currently probably much better than mine! I'd appreciate that.
@yuvipanda sure!
I'll be updating this ticket periodically as I move these over.
biology-prod
cs194-prod
data100-prod
data102-prod
data8-prod
datahub-prod
dlab-prod
eecs-prod
highschool-prod
ischool-prod
julia-prod
prob140-prod
publichealth-prod
r-prod
stat159-prod
workshop-prod
Also...
data8x-prod
@yuvipanda will #2926 also stop the local logging on data8x?
I've created a sink for data8x as well, @felder - https://console.cloud.google.com/logs/router?project=data8x-scratch. #2926 should stop local logging on data8x as well
I think we can close this now once the current hub logs are archived into google cloud storage.
@yuvipanda ok that does it, the logs are in the bucket.
@felder awesome, thank you so much!
@felder @yuvipanda Based on my initial exploration, I found that most hubs' log data starts from around 2019. Do we have logs before that? If it is available, it will aid my exploration as part of #2949.
@balajialg i had some logs in google drive, shared them with you. @ryanlovett do you have any lying around?
@yuvipanda I had sent mine from drive to @felder and then deleted them from drive.
All of the logs that I archived, including the ones @ryanlovett is referring to, are in the storage bucket.
The ones I got from Ryan were from when the r hub logging partition filled earlier this semester.
I just checked ds-instr's Google drive space and there were no logs there.
Thanks a lot all for looking into this! Appreciate it. I will do more analysis with the data that was shared by @yuvipanda next week, Thanks
Currently the hub logs grow until they run out of space. This makes the hub pod quite unhappy.