UM-Bridge (the UQ and Model Bridge) provides a unified interface for numerical models that is accessible from virtually any programming language or framework.
Fixed the issue where a job would sometimes fail to start the model server due to the port being occupied.
Details
Previously, the command lsof was used to determine whether a port is free or not. However, without root permissions lsof can only show open connections for the user who ran the command (i.e. not for all the other users on the HPC cluster).
The issue is fixed by instead using a simple C++ program which attempts to bind a socket to an address for a given port.
[!WARNING]
There is still a race condition that can occur in the time frame between the job script checking the port and the model server actually occupying it. However, I didn't encounter this issue yet during my tests and fixing it would require some major changes to the UM-Bridge interface used for serving models.
Summary
Fixed the issue where a job would sometimes fail to start the model server due to the port being occupied.
Details
lsof
was used to determine whether a port is free or not. However, without root permissionslsof
can only show open connections for the user who ran the command (i.e. not for all the other users on the HPC cluster).Related Issues
closes #83, closes #48