caracal-pipeline / caracal

Containerized Automated Radio Astronomy Calibration (CARACal) pipeline
GNU General Public License v2.0
28 stars 6 forks source link

ddfacet opening too many files #1541

Closed healytwin1 closed 7 months ago

healytwin1 commented 9 months ago

I am trying to test the ddcal worker so that we can implement the masking work around discussed at the last developers telecon, but I am getting # OSError: [Errno 23] Too many open files in system: '/dev/shm/ddf.94/DATA:0:0' I am using the ddcal_mask branch with stimela 1.7.6

This is running on the meergas cluster which has 128 cpus (I have restricted it to 8), and 1TB of RAM, ulimit -a gives:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 4127296
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1048576
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

which is more generous than what is available on ilifu where I am able to run ddfacet through the ddcal worker with no issues which is running caracal using the same branch, but with stimela 1.7.7.

Any ideas?

healytwin1 commented 9 months ago

@o-smirnov @KshitijT @SpheMakh

o-smirnov commented 9 months ago

Just a hunch, could you up the max locked memory (-l)?

healytwin1 commented 9 months ago

It is the same as what is on ilifu, where I have no problems running ddcal. The only other thing that is different is that on ilifu I am using python 3.8.3 and on the meergas cluster it is 3.8.13.

KshitijT commented 8 months ago

Might this be due to limits via /proc/sys/fs/file-max ? Dane mentioned this is 8192 for meergas cluster. @o-smirnov what do you think?

o-smirnov commented 8 months ago

Yes that's exactly it. The SSD method loves to open a huge amount of shared memory files. Can the admin increase the limit?

KshitijT commented 8 months ago

Yes that's exactly it. The SSD method loves to open a huge amount of shared memory files. Can the admin increase the limit?

pinging @healytwin1 and @dane-kleiner .

healytwin1 commented 7 months ago

This is now sorted with the update to one of the /proc/ files on the cluster. It was a cluster issue not a ddfacet issue.