Running graphein with pymol in parallel

OliviaViessmann commented 1 year ago

Describe the bug I would like to run grapheins create_mesh() function in parallel on multiple workers. I assume that I need to spin up multiple pymol sessions MolViewer() for each worker specifying a dedicated PORT. However I am not sure how to set this from the "outside" -- this might actually be a feature request.

To Reproduce Steps to reproduce the behavior: trying to run something like Parallel(n_jobs = 8).it(create_mesh)(pdb) for pdb in pdbs This gets stuck if run naively.

Expected behavior Want to specify ports for each worker so that I can run pymol sessions on each of them

OS: Ubuntu 20.04.4 LTS
Python Version 3.8.16
Graphein Version [e.g. 22] & how it was installed git pull + pip install, version: 1.5.2

a-r-j commented 1 year ago

Hi @OliviaViessmann I'm trying to take a look at this but I'm struggling to get Pytorch3d working on my dev machine.

Did you run into any issues?

https://github.com/facebookresearch/pytorch3d/issues/1406

OliviaViessmann commented 1 year ago

Nope, I didn't. It runs fine for me. No issues with Pytorch3d on my end.

a-r-j commented 1 year ago

If you checkout the PR the ports should now be configurable by the ProteinMeshConfig. I suppose you could zip the configs with the relevant ports with the PDBs and pass them both as args to create_mesh.

OliviaViessmann commented 1 year ago

Hi a-r-j, thanks a ton for looking into this and making the adaptions. I am running on the new PR and configured a port, but I think PORT=9123 is still hard coded somewhere. Did it work for you? Am I missing anything? Here is the code snippet I use: pymol_commands = {"pymol_commands": ["set surface_quality, 2", "show surface"]} pymol_config = ProteinMeshConfig(**pymol_commands, pymol_port=9999) verts_x, faces_x, aux = create_mesh(pdb_file=pdb_file_x, config=pymol_config) I put a print statement in get_obj_file() to double check the port is set, but somewhere it spins up a pymol session with default setting, because I get:


xml-rpc server running on host localhost, port 9123
A PyMOL RPC server is already running.
xml-rpc server running on host localhost, port 9123

a-r-j commented 1 year ago

Yep, I missed a spot!

OliviaViessmann commented 1 year ago

Thanks!!

a-r-j commented 1 year ago

Has this resolved the issue @OliviaViessmann? If so, I will merge the PR shortly.

Also, if you could share a short snippet I can turn into a test that would be super helpful :)

OliviaViessmann commented 1 year ago

It is working 50/50. It now does run in parallel, but it does not run on the ports specified, but increments from 9123 up. Here is a minimum snippet with port printouts

import socket
def is_port_in_use(port: int) -> bool:
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        return s.connect_ex(("localhost", port)) == 0

def func(pdb_file: str):
   pymol_commands = {
        "pymol_commands": [
            "show surface", ]
    }
    port = random.randint(1025, 65535)
    while not is_port_in_use(port=port):
        port = random.randint(1025, 65535)
    print(port)
    pymol_config = ProteinMeshConfig(**pymol_commands, port=port)
    verts, faces, aux = create_mesh(pdb_file=pdb_file, config=pymol_config)
    return verts

def main():
    parallel_iter = Parallel(n_jobs=8).it(
        delayed(func)(pdb_file) for pdb_file in pdb_files
    )

Here is an exemplar prinout of ports and pymol outputs:

6379
xml-rpc server running on host localhost, port 9124
xml-rpc server running on host localhost, port 9125
xml-rpc server running on host localhost, port 9126
xml-rpc server running on host localhost, port 9127
xml-rpc server could not be started
xml-rpc server could not be started
xml-rpc server could not be started
xml-rpc server could not be started
9004

a-r-j commented 1 year ago

Thanks!! Hmm, I'll try to check it out this week. A quick heads up though: the config param added in #262 is pymol_port rather than port

OliviaViessmann commented 1 year ago

Sorry, yes, mistake on my end. I have the correct version running with pymol_port = port -- just did a crappy job at copy/pasting with manual edit... I am also printing the port inside the graphein create_mesh() function with print("pymol port: ", config.pymol_port) and it is correctly set in there, but it still ramps up servers on the 912x ports

pymol port:  34873
xml-rpc server could not be started
pymol port:  9007
xml-rpc server running on host localhost, port 9125
pymol port:  9124
xml-rpc server running on host localhost, port 9126
xml-rpc server running on host localhost, port 9125

Thanks for looking into it!

a-r-j commented 1 year ago

I did some digging and this looks like a pymol limitation, rather than a graphein limitation:

https://github.com/schrodinger/pymol-open-source/blob/d0a3380636e3d4079a0320b372a330dcf797d660/modules/pymol/rpc.py#L23

We need to be able to set the port on the pymol listener and, sadly, we don't have easy access to it. Also, the max retries limits the number of servers you can run.

I suppose one way to go is to patch your local pymol install. You could for example set the port via an env var that pymol would read instead of the hardcoded 9123 and make the following modification to the Graphein viewer class:

class MolViewer(object):
    def __init__(self, host=HOST, port=PORT):
        self.host = host
        self.port = int(port)
        self._process = None

    def __del__(self):
        self.stop()

    def __getattr__(self, key):
        if not self._process_is_running():
            self.start(["-cKQ"])

        return getattr(self._server, key)

    def _process_is_running(self):
        return self._process is not None and self._process.poll() is None

    def start(self, args=("-Q",), exe="pymol"):
        """Start the PyMOL RPC server and connect to it
        Start simple GUI (-xi), suppress all output (-Q):
            >>> viewer.start(["-xiQ"])
        Start headless (-cK), with some output (-q):
            >>> viewer.start(["-cKq"])
        """
        if self._process_is_running():
            print("A PyMOL RPC server is already running.")
            return

        assert isinstance(args, (list, tuple))

       ########################## CHANGE HERE

        env = os.environ.copy()
        env["PYMOL_XMLRPC_PORT"] = str(self.port)
        self._process = subprocess.Popen([exe, "-R"] + list(args), env=env)

       ########################## END CHANGE

        self._server = Server(uri="http://%s:%d/RPC2" % (self.host, self.port))

        # wait for the server
        while True:
            try:
                self._server.bg_color("white")
                break
            except IOError:
                time.sleep(0.1)

    def stop(self):
        if self._process_is_running():
            self._process.terminate()

    def display(self, width=0, height=0, ray=False, timeout=120):
        """Display PyMol session
        :param width: width in pixels (0 uses current viewport)
        :param height: height in pixels (0 uses current viewport)
        :param ray: use ray tracing (if running PyMOL headless, this parameter
        has no effect and ray tracing is always used)
        :param timeout: timeout in seconds
        Returns
        -------
        fig : IPython.display.Image
        """
        from IPython.display import Image, display
        from ipywidgets import IntProgress

        progress_max = int((timeout * 20) ** 0.5)
        progress = None
        filename = tempfile.mktemp(".png")

        try:
            self._server.png(filename, width, height, -1, int(ray))

            for i in range(1, progress_max):
                if os.path.exists(filename):
                    break

                if progress is None:
                    progress = IntProgress(min=0, max=progress_max)
                    display(progress)

                progress.value += 1
                time.sleep(i / 10.0)

            if not os.path.exists(filename):
                raise RuntimeError("timeout exceeded")

            return Image(filename)
        finally:
            if progress is not None:
                progress.close()

            try:
                os.unlink(filename)
            except:
                pass

Alternatively, I also came across this which seems to be a similar RPC component that reads from an env var.

OliviaViessmann commented 1 year ago

Ahhhh, the lines you sent totally explain the behaviour about the ports being set between 9123 to 9128. Ok, I might try out the local pymol patch. Or give up and manually throw this on a bunch of batch machines. Not sure what will end up being faster :) Thanks for the workaround suggestion -- if I end up trying it I will report back on it!

Want me to close this as "not planned"?

a-r-j commented 1 year ago

Keen to hear how it goes :)

a-r-j / graphein

Running graphein with pymol in parallel #261