Open 7yl4r opened 1 year ago
This seems to have worked itself out in my absence. L2 processing seems up to date.
From: Tylar @.> Sent: Monday, February 27, 2023 10:12 AM To: USF-IMARS/l2-processing @.> Cc: dotis @.>; Mention @.> Subject: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)
from @dotishttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotis&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FtgtxOKQLDF2uldx%2B9hOYbnNkYrOFt7srH%2Fbp0kIUHg%3D&reserved=0: All of my L2 processing has gone down on seashell.
I'm getting a ton of out of memory errors.
logfile is in the email
— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSF-IMARS%2Fl2-processing%2Fissues%2F7&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GhGFzQUxui%2FDzUCUMhSP6SFC2zSYHzWTriA1ZIv8GoU%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGKS3DK7AT6H2NFM7PUL24LWZS77TANCNFSM6AAAAAAVJQ5V5E&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9MjiLecRBDeag1c88MeVwrYnFsyQHYC9CArnNki0hWc%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
I spoke too soon. The file processing is still stopping due to an out of memory error.
From: Daniel Otis @.> Sent: Monday, February 27, 2023 10:16 AM To: USF-IMARS/l2-processing @.> Subject: Re: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)
This seems to have worked itself out in my absence. L2 processing seems up to date.
From: Tylar @.> Sent: Monday, February 27, 2023 10:12 AM To: USF-IMARS/l2-processing @.> Cc: dotis @.>; Mention @.> Subject: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)
from @dotishttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotis&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FtgtxOKQLDF2uldx%2B9hOYbnNkYrOFt7srH%2Fbp0kIUHg%3D&reserved=0: All of my L2 processing has gone down on seashell.
I'm getting a ton of out of memory errors.
logfile is in the email
— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSF-IMARS%2Fl2-processing%2Fissues%2F7&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GhGFzQUxui%2FDzUCUMhSP6SFC2zSYHzWTriA1ZIv8GoU%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGKS3DK7AT6H2NFM7PUL24LWZS77TANCNFSM6AAAAAAVJQ5V5E&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9MjiLecRBDeag1c88MeVwrYnFsyQHYC9CArnNki0hWc%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
Some files process ok and some do not. It seems the nightly cron jobs work to some extent, but many also fail.
This error usually comes from the JVM. The server probably isn't actually out of RAM. If this assumption is correct then the solution is to add the following arguments to the call that starts up the JVM:
-Xms
to set the initial heap size-Xmx
to set the maximum heap sizeI think you can set these both to values like 32GB without issue on seashell.
The tricky part is to find out where the JVM is being started; it is probably a bash script somewhere.
Hmm. It must be somewhere in here:
/opt/snap_6_0/bin/gpt
From: Tylar @.> Sent: Monday, February 27, 2023 3:50 PM To: USF-IMARS/l2-processing @.> Cc: dotis @.>; Mention @.> Subject: Re: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)
This error usually comes from the JVM. The server probably isn't actually out of RAM. If this assumption is correct then the solution is to add the following arguments to the call that starts up the JVM:
I think you can set these both to values like 32GB without issue on seashell.
The tricky part is to find out where the JVM is being started; it is probably a bash script somewhere.
— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSF-IMARS%2Fl2-processing%2Fissues%2F7%23issuecomment-1447068832&data=05%7C01%7Cdotis%40usf.edu%7C81034b9f886048f3c76608db19045599%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131278625051187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TfY09Tf39pGfVObiqhbO7p3qAEKEg9X5M4G7qt4NjXU%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGKS3DLBHDWQXEEWEUYWWY3WZUHTHANCNFSM6AAAAAAVJQ5V5E&data=05%7C01%7Cdotis%40usf.edu%7C81034b9f886048f3c76608db19045599%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131278625051187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JAq9tDjoKGGvyHfvK7FOyctHr4ocdIR%2FgoZieqXhKx0%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
Currently set to 66G.
From gpt.vmoptions file:
-Xmx66G
From: Daniel Otis @.> Sent: Monday, February 27, 2023 3:56 PM To: USF-IMARS/l2-processing @.> Subject: Re: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)
Hmm. It must be somewhere in here:
/opt/snap_6_0/bin/gpt
From: Tylar @.> Sent: Monday, February 27, 2023 3:50 PM To: USF-IMARS/l2-processing @.> Cc: dotis @.>; Mention @.> Subject: Re: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)
This error usually comes from the JVM. The server probably isn't actually out of RAM. If this assumption is correct then the solution is to add the following arguments to the call that starts up the JVM:
I think you can set these both to values like 32GB without issue on seashell.
The tricky part is to find out where the JVM is being started; it is probably a bash script somewhere.
— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSF-IMARS%2Fl2-processing%2Fissues%2F7%23issuecomment-1447068832&data=05%7C01%7Cdotis%40usf.edu%7C81034b9f886048f3c76608db19045599%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131278625051187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TfY09Tf39pGfVObiqhbO7p3qAEKEg9X5M4G7qt4NjXU%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGKS3DLBHDWQXEEWEUYWWY3WZUHTHANCNFSM6AAAAAAVJQ5V5E&data=05%7C01%7Cdotis%40usf.edu%7C81034b9f886048f3c76608db19045599%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131278625051187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JAq9tDjoKGGvyHfvK7FOyctHr4ocdIR%2FgoZieqXhKx0%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>
[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.
yikes. that should be plenty. Since the grafana monitoring is no longer running I can't see the resource usage history of seashell, but it may be useful to look at the output of top
next time you are running something big:
[root@seashell ~]# top | head
top - 20:39:05 up 146 days, 23:27, 1 user, load average: 2.56, 2.52, 2.71
Tasks: 886 total, 1 running, 883 sleeping, 0 stopped, 2 zombie
Cpu(s): 4.6%us, 0.9%sy, 0.0%ni, 94.2%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 99153956k total, 97191036k used, 1962920k free, 70808k buffers
Swap: 101253116k total, 7745976k used, 93507140k free, 93807184k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
37298 dotis 20 0 75.0g 1.5g 27m S 401.9 1.6 0:54.76 java
37453 root 20 0 28132 1964 980 R 1.9 0.0 0:00.03 top
1 root 20 0 33672 1256 1044 S 0.0 0.0 1:54.50 init
from @dotis: All of my L2 processing has gone down on seashell.
I'm getting a ton of out of memory errors.