gama-platform / gama.old

Main repository for developing the 1.x versions of GAMA
GNU General Public License v3.0
304 stars 99 forks source link

headless not working on HPC (Linux) #3656

Closed chapuisk closed 1 year ago

chapuisk commented 1 year ago

Describe the bug When trying to use headless batch mode on a HPC, Gama is prevented to launch correctly because it cannot write metadata in default locations. See the attached log file

To Reproduce Difficult to reproduce as it is a lot dependent on the HPC environment. However, program launch on HPC usually has strong limitations on their capacity to write on local system files. Hence it might be an issue for other HPC usage of the platform

Expected behavior Being able to execute the platform seamlessly :) At least being able to specify a directory where to store/access/write the metadatas (for which a symbolic link could be created for instance).

Desktop (please complete the following information):

[EDIT NOTE] : The model I want to run is executing correctly in my local machine, using headless batch mode

AlexisDrogoul commented 1 year ago

Have you tried to do what is proposed in the log ? By default the platform writes its content under the current working directory when the platform is launched. Use the -data parameter to specify a different content area for the platform.

chapuisk commented 1 year ago

Hey @AlexisDrogoul, it is unclear how this parameter should be used. I can modify the Gama.ini file, but there is no specific guidance on how to proceed. I'll try several things and come back here ... if the problem is general to any HPC deployment of the platform, integrate this parameter in the headless wrapper could be a better solution than changing directly configuration file.

AlexisDrogoul commented 1 year ago

There is a chance that this configuration (i.e. place one can write to) is specific to each implementation of HPC, and then needs to be passed as parameter to the execution of GAMA. One can certainly pass parameters to gama-headless right ? (otherwise I do not see the point of having a command line version...)

chapuisk commented 1 year ago

Update: I manage to have a proper symbolic link to the folder that Gama needs to write to in order to load properly (created a .workspace1 folder with symbolic link to ressources accessible from the nodes itself). But now I face another issue with corresponding log file - any ideas @AlexisDrogoul?

AlexisDrogoul commented 1 year ago

I added a log in https://github.com/gama-platform/gama/commit/dc7356abab3f4e545f55db481e9253e8e3ed7805 so that it is possible to see which URI is failing to resolve against the baseURI (and what is the baseURI in your case).

One thing to look at is:

(from EMF FAQ, EMF being the backend of XText, itself the backend of GAML)

EMF recognizes a couple of well known archive schemes, such as "zip" and "jar", 
so everything will work fine if the URI of an archived resource starts, for 
example, with "zip:". Although most applications probably use one of these 
recognized schemes, you may find that certain runtimes use a different one. For 
example, when running in the IBM WebSphere Application Server, 
Class.getResource(String) may return a URL with a "wsjar" scheme, like 
"wsjar:file:/C:/dev/ws/default/sample_app/xsd.resources.jar!/org/eclipse/xsd/plugin.properties". 
By default, EMF will not recognize this as an archive URI and will fail to 
handle it correctly, probably resulting in multiple unregistered resources 
errors and/or null pointer exceptions. If you are facing this problem, you will 
need to change EMF's set of recognized archive schemes. This is done by defining
a property named "org.eclipse.emf.common.util.URI.archiveSchemes", whose value 
is the desired space-separated list of archive schemes. Here's an example of how
you could define it when invoking your Java application: 

java MyApp -Dorg.eclipse.emf.common.util.URI.archiveSchemes="wsjar wszip jar 
zip" 

How is GAMA installed and distributed on the machines composing the HPC environment ?

lesquoyb commented 1 year ago

hello @chapuisk, seeing your log it looks a bit like the error message you can have with gama-server when you don't have the rights to write in the gama installation folder. It is documented here. Methods are a bit different because it's in server mode, but it also happens at the linking phase when trying to build the model. So maybe the symlink is not properly done, or gama tries to write the workspace next to the headless script anyway. What happens if you run headless with a script located in a directory where you have all the writing rights ?

chapuisk commented 1 year ago

Hey @AlexisDrogoul and @lesquoyb thx a lot looking at the issue. I've tried to add "-Dorg.eclipse..." new option to launch Gama but nothing changed. @lesquoyb Seems that Gama is able to write in the symlinked folder because it is full of files and folders. Moreover, I tested with Gama 1.8.2 RC2 and it works.

AlexisDrogoul commented 1 year ago

@chapuisk Have you had a chance to spot which URI is failing (if it is still the problem) ?

chapuisk commented 1 year ago

I'll upload the alpha version build with commit https://github.com/gama-platform/gama/commit/dc7356abab3f4e545f55db481e9253e8e3ed7805 to see if I can identify the URI that prevent Gama from running... once again, 1.8.2 RC2 was running smoothly on the exact same configuration (and it is now, as I tested today). Getting back here asap

AlexisDrogoul commented 1 year ago

@RoiArthurB can we have a list of all the commits / changes made to the specific plugin msi.gama.headless since GAMA 1.8.2RC2 ?

I have for myself identified three commits that are directly related to the class (GamlResourceIndexer) involved, which may or may not have had an impact on how URIs are computed: https://github.com/gama-platform/gama/commit/8b4bd63248c38083bf112aeef061d1ed3093bce7 / https://github.com/gama-platform/gama/commit/b8f3774d961a2c7e3af7ebc04133ad04d2f041a1 / https://github.com/gama-platform/gama/commit/8e860d138b6c95c4a4420d11eb927bba3b26deef .

However, w/o a clear idea of the type of URI the program is dealing with, I cannot really investigate @chapuisk

RoiArthurB commented 1 year ago

@AlexisDrogoul here's a list of every commits in msi.gama.headless since 1.8.2-RC2 'till current master with a quick overview of files modifications per commit : https://pastebin.com/TvNvkv18

(PS: The command I ran is git log --stat 1.8.2-RC2..GAMA_1.9.0 msi.gama.headless from the root of this repository)

chapuisk commented 1 year ago

I don't know exactly how, but I manage to get my model running on the HPC. I did not change the Gama version, but I add a first run with a very simple batch experiment added to the samples folder model inside Gama headless. It may have had something to the .workspace... anyway, I do have other problems (memory and allocation of cores to batch of Gama) but this one is no more an issue. Closed.