Investigate docker commands that do not currently work with virtualization and make them work

dg1223 commented 8 months ago

Related issue and PR: #2300, #2308

The following docker command patterns throw TypeError in Ganga.

j.virtualization = Docker("docker://image:tag") and j.virtualization = Docker("image:tag")

The documentation for virtualization Virtualization.rst has been updated with the command that currently works: j.virtualization = Docker(image="image:tag")

We need to see if we can fix the code such that Docker('image:tag') would work instead of Docker(image='image:tag').

Commands to reproduce issue

j = Job(name='dockertest1')
j.virtualization = Docker("docker://fedora:latest")

j = Job(name='dockertest2')
j.virtualization = Docker("fedora:latest")

Error

ERROR    !!Unknown/Unexpected ERROR!!
ERROR    If you're able to reproduce this please report this to the Ganga developers!
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[30], line 1
----> 1 j.virtualization = Docker("docker://fedora:latest")

File ~/Documents/GitHub/gsoc-2024/ganga/gangaenv/lib/python3.8/site-packages/ganga/GangaCore/GPIDev/Base/Proxy.py:883, in GPIProxyClassFactory.<locals>._init(self, *args, **kwds)
    881 elif arg_len < len(getfullargspec(pluginclass.__init__)[0]):
    882     clean_args = (stripProxy(arg) for arg in args)
--> 883     instance = pluginclass(*clean_args)
    884 else:
    885     # In the future we will just pass the args to the classes directly and throw excepions, but for now we're trying to maintain old behavior
    886     logger.warning("Cannot use arguments: '%s' for constructing class type '%s'. Ignoring." %
    887                    (args, getName(pluginclass)))

TypeError: __init__() missing 1 required positional argument: 'mode'

egede commented 8 months ago

The problem is with the __init__ function in Docker.py. Simply removing mode from the list of arguments and the the subsequent assignment line, I think will resolve the issue.

egede commented 8 months ago

It would be good to write a few test cases that actually show that the instatiation of the class works correctly in a few different cases. So

d = Docker()
d = Docker('fedora:latest')
d = Docker(image='fedora:latest')
d = Docker(image='fedora:latest', mode='P1')

should all work and result in the expected assignments (including default values) of the schema.

dg1223 commented 8 months ago

Thanks for the hint. Yes, it looks like __init__ function doesn't need the mode parameter. It's not used anywhere other than getting assigned as an instance variable. OR maybe we can just add 'P1' as its default value and leave it be in case we decide to do something with it later.

I'll write some tests too. I'm currently working on the GSoC challenge. As soon as I am done, I'll fix this issue.

egede commented 8 months ago

Thanks for the hint. Yes, it looks like __init__ function doesn't need the mode parameter. It's not used anywhere other than getting assigned as an instance variable. OR maybe we can just add 'P1' as its default value and leave it be in case we decide to do something with it later.

I'll write some tests too. I'm currently working on the GSoC challenge. As soon as I am done, I'll fix this issue.

As the mode parameter is pat of the schema for the class, it will get assigned a default value anyway. As this will make the init method identical to the one in the base class, I believe that we can actually just completely remove the __init__ function from Docker.py. This is in fact what is done in the corresponding Singularity.py.

dg1223 commented 8 months ago

Yes, just checked Singularity.py. I see what you mean. Sounds good, I'll remove it then.

dg1223 commented 8 months ago

I have started working on it. All of the Docker commands you listed above work now on my local. Writing the tests now.

dg1223 commented 8 months ago

I have been having some issues lately with initializing jobs from inside the Ganga environment vs initializing it from a separate Python script outside of the environment.

Inside the Ganga IPython environment, when I run the following commands:

j = Job(name="dockertest")
j.name

I get the proper job name as output: dockertest

However, when I run the following script from outside the Ganga environment:

from ganga.GangaCore.Lib.Virtualization import Docker
from ganga.GangaCore.GPIDev.Lib.Job.Job import Job

j = Job(name="dockertest")
print(f"{j.name = }")

I get an empty string as an output: j.name = ''

Modules are getting imported correctly because I do not see any ImportError.

I installed Ganga locally through the setup.py file. I did not pip install ganga

mesmith75 commented 8 months ago

You haven't imported ganga correctly. It should be more like:

import ganga.ganga
j = ganga.Job(name="mytest")

dg1223 commented 8 months ago

You haven't imported ganga correctly. It should be more like:
import ganga.ganga
j = ganga.Job(name="mytest")

Thanks. Importing ganga like this works. However, what I see is existing tests and other modules in the source code don't import the entire Ganga environment. They rather import the required modules (e.g. Job, Docker) separately.

I was able to make it work by importing Job and Docker like this:

from GangaCore.testlib.GangaUnitTest import GangaUnitTest
from GangaCore.Lib.Virtualization import Docker

class TestDocker(GangaUnitTest):
    def test_DockerNoImageArg(self):
        from GangaCore.GPI import Job
        j = Job(name="dockertest")
        ...

I am still not clear why this works but importing the Job module directly from GPIDev doesn't work. Not sure if it has something to do with this huge __init__.py file in GangaCore.

dg1223 commented 8 months ago

Thanks for the hint. Yes, it looks like __init__ function doesn't need the mode parameter. It's not used anywhere other than getting assigned as an instance variable. OR maybe we can just add 'P1' as its default value and leave it be in case we decide to do something with it later. I'll write some tests too. I'm currently working on the GSoC challenge. As soon as I am done, I'll fix this issue.

As the mode parameter is pat of the schema for the class, it will get assigned a default value anyway. As this will make the init method identical to the one in the base class, I believe that we can actually just completely remove the __init__ function from Docker.py. This is in fact what is done in the corresponding Singularity.py.

@egede

I am able to make the following commands work by setting default values to the image and mode parameters in Docker,py:

d = Docker()
d = Docker('fedora:latest')
d = Docker(image='fedora:latest')
d = Docker(image='fedora:latest', mode='P1')

Docker.py


    def __init__(self, image='', mode='P1'):
        super().__init__(image)
        self.mode = mode

However, removing the __init__ function entirely keeps giving me the same error: TypeError: __init__() missing 1 required positional argument.... I have tried setting the defaultvalue (defvalue) through the _schema in Docker,py but that didn't work either. Maybe I still lack proper understanding of the basic source code.

Since the issue itself is resolved, please let me know if setting default parameters in the __init__ function is acceptable or the function must be entirely removed.

Opened a PR (#2314)

mesmith75 commented 7 months ago

Fixed by #2314

ganga-devs / ganga

Investigate docker commands that do not currently work with virtualization and make them work #2309