USF-IMARS / l2-processing

python notebooks for doing the L2 processing, create means, do mapping
0 stars 1 forks source link

OOM errors on seashell #7

Open 7yl4r opened 1 year ago

7yl4r commented 1 year ago

from @dotis: All of my L2 processing has gone down on seashell.

I'm getting a ton of out of memory errors.

# There is insufficient memory for the Java Runtime Environment to continue.
# Cannot create GC thread. Out of system resources.
# An error report file with more information is saved as:
# /home1/dotis/DB_files/DB_v2/hs_err_pid7317.log

logfile is in [the email]( https://mail.google.com/mail/u/0/#inbox/FMfcgzGrcjLZNBBrJsvDzVLwrWVFTTMS )
dotis commented 1 year ago

This seems to have worked itself out in my absence. L2 processing seems up to date.


From: Tylar @.> Sent: Monday, February 27, 2023 10:12 AM To: USF-IMARS/l2-processing @.> Cc: dotis @.>; Mention @.> Subject: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)

from @dotishttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotis&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FtgtxOKQLDF2uldx%2B9hOYbnNkYrOFt7srH%2Fbp0kIUHg%3D&reserved=0: All of my L2 processing has gone down on seashell.

I'm getting a ton of out of memory errors.

There is insufficient memory for the Java Runtime Environment to continue.

Cannot create GC thread. Out of system resources.

An error report file with more information is saved as:

/home1/dotis/DB_files/DB_v2/hs_err_pid7317.log

logfile is in the email

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSF-IMARS%2Fl2-processing%2Fissues%2F7&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GhGFzQUxui%2FDzUCUMhSP6SFC2zSYHzWTriA1ZIv8GoU%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGKS3DK7AT6H2NFM7PUL24LWZS77TANCNFSM6AAAAAAVJQ5V5E&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9MjiLecRBDeag1c88MeVwrYnFsyQHYC9CArnNki0hWc%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.

dotis commented 1 year ago

I spoke too soon. The file processing is still stopping due to an out of memory error.


From: Daniel Otis @.> Sent: Monday, February 27, 2023 10:16 AM To: USF-IMARS/l2-processing @.> Subject: Re: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)

This seems to have worked itself out in my absence. L2 processing seems up to date.


From: Tylar @.> Sent: Monday, February 27, 2023 10:12 AM To: USF-IMARS/l2-processing @.> Cc: dotis @.>; Mention @.> Subject: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)

from @dotishttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdotis&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=FtgtxOKQLDF2uldx%2B9hOYbnNkYrOFt7srH%2Fbp0kIUHg%3D&reserved=0: All of my L2 processing has gone down on seashell.

I'm getting a ton of out of memory errors.

There is insufficient memory for the Java Runtime Environment to continue.

Cannot create GC thread. Out of system resources.

An error report file with more information is saved as:

/home1/dotis/DB_files/DB_v2/hs_err_pid7317.log

logfile is in the email

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSF-IMARS%2Fl2-processing%2Fissues%2F7&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GhGFzQUxui%2FDzUCUMhSP6SFC2zSYHzWTriA1ZIv8GoU%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGKS3DK7AT6H2NFM7PUL24LWZS77TANCNFSM6AAAAAAVJQ5V5E&data=05%7C01%7Cdotis%40usf.edu%7C1adc0e809b424f82ecf708db18d51d0e%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131075813876553%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=9MjiLecRBDeag1c88MeVwrYnFsyQHYC9CArnNki0hWc%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.

dotis commented 1 year ago

Some files process ok and some do not. It seems the nightly cron jobs work to some extent, but many also fail.

7yl4r commented 1 year ago

This error usually comes from the JVM. The server probably isn't actually out of RAM. If this assumption is correct then the solution is to add the following arguments to the call that starts up the JVM:

I think you can set these both to values like 32GB without issue on seashell.

The tricky part is to find out where the JVM is being started; it is probably a bash script somewhere.

dotis commented 1 year ago

Hmm. It must be somewhere in here:

/opt/snap_6_0/bin/gpt


From: Tylar @.> Sent: Monday, February 27, 2023 3:50 PM To: USF-IMARS/l2-processing @.> Cc: dotis @.>; Mention @.> Subject: Re: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)

This error usually comes from the JVM. The server probably isn't actually out of RAM. If this assumption is correct then the solution is to add the following arguments to the call that starts up the JVM:

I think you can set these both to values like 32GB without issue on seashell.

The tricky part is to find out where the JVM is being started; it is probably a bash script somewhere.

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSF-IMARS%2Fl2-processing%2Fissues%2F7%23issuecomment-1447068832&data=05%7C01%7Cdotis%40usf.edu%7C81034b9f886048f3c76608db19045599%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131278625051187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TfY09Tf39pGfVObiqhbO7p3qAEKEg9X5M4G7qt4NjXU%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGKS3DLBHDWQXEEWEUYWWY3WZUHTHANCNFSM6AAAAAAVJQ5V5E&data=05%7C01%7Cdotis%40usf.edu%7C81034b9f886048f3c76608db19045599%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131278625051187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JAq9tDjoKGGvyHfvK7FOyctHr4ocdIR%2FgoZieqXhKx0%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.

dotis commented 1 year ago

Currently set to 66G.

From gpt.vmoptions file:

Enter one VM parameter per line

For example, to adjust the maximum memory usage to 512 MB, uncomment the following line:

-Xmx512m

To include another file, uncomment the following line:

-include-options [path to other .vmoption file]

-Xmx66G


From: Daniel Otis @.> Sent: Monday, February 27, 2023 3:56 PM To: USF-IMARS/l2-processing @.> Subject: Re: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)

Hmm. It must be somewhere in here:

/opt/snap_6_0/bin/gpt


From: Tylar @.> Sent: Monday, February 27, 2023 3:50 PM To: USF-IMARS/l2-processing @.> Cc: dotis @.>; Mention @.> Subject: Re: [USF-IMARS/l2-processing] OOM errors on seashell (Issue #7)

This error usually comes from the JVM. The server probably isn't actually out of RAM. If this assumption is correct then the solution is to add the following arguments to the call that starts up the JVM:

I think you can set these both to values like 32GB without issue on seashell.

The tricky part is to find out where the JVM is being started; it is probably a bash script somewhere.

— Reply to this email directly, view it on GitHubhttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSF-IMARS%2Fl2-processing%2Fissues%2F7%23issuecomment-1447068832&data=05%7C01%7Cdotis%40usf.edu%7C81034b9f886048f3c76608db19045599%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131278625051187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TfY09Tf39pGfVObiqhbO7p3qAEKEg9X5M4G7qt4NjXU%3D&reserved=0, or unsubscribehttps://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAGKS3DLBHDWQXEEWEUYWWY3WZUHTHANCNFSM6AAAAAAVJQ5V5E&data=05%7C01%7Cdotis%40usf.edu%7C81034b9f886048f3c76608db19045599%7C741bf7dee2e546df8d6782607df9deaa%7C0%7C0%7C638131278625051187%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=JAq9tDjoKGGvyHfvK7FOyctHr4ocdIR%2FgoZieqXhKx0%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

[EXTERNAL EMAIL] DO NOT CLICK links or attachments unless you recognize the sender and know the content is safe.

7yl4r commented 1 year ago

yikes. that should be plenty. Since the grafana monitoring is no longer running I can't see the resource usage history of seashell, but it may be useful to look at the output of top next time you are running something big:

[root@seashell ~]# top | head

top - 20:39:05 up 146 days, 23:27,  1 user,  load average: 2.56, 2.52, 2.71
Tasks: 886 total,   1 running, 883 sleeping,   0 stopped,   2 zombie
Cpu(s):  4.6%us,  0.9%sy,  0.0%ni, 94.2%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  99153956k total, 97191036k used,  1962920k free,    70808k buffers
Swap: 101253116k total,  7745976k used, 93507140k free, 93807184k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                            
37298 dotis     20   0 75.0g 1.5g  27m S 401.9  1.6   0:54.76 java                                                                                              
37453 root      20   0 28132 1964  980 R  1.9  0.0   0:00.03 top                                                                                                
    1 root      20   0 33672 1256 1044 S  0.0  0.0   1:54.50 init