fiji-hpc / parallel-macro

12 stars 1 forks source link

The output directory `.scijava-parallel` directory is not automatically created #3

Closed velissarious closed 2 years ago

velissarious commented 3 years ago

Thanks for the fast and detailed answer! It is indeed better but then the job fails. I have this error message in the console: FijiConsole_failedJob.txt I checked manually and the .scijava-parallel folder doesn't exist indeed. Maybe our cluster structure is different? Is there any other place I could look for or copy it from?

If it's of any use, there is a bash script created on the cluster side that I can't link here (not supported file type).

Thanks!

Originally posted by @sebherbert in https://github.com/fiji-hpc/parallel-macro/issues/2#issuecomment-949584855

velissarious commented 3 years ago

The directory .scijava-parallel contains the redirected standard output and standard error output from the all jobs executed using HPC Workflow Manager.

This directory should be automatically created by HPC Workflow Manager upon connecting to the cluster if the directory does not already exist.

Can you print the $HOME environment variable of the remote cluster you are using? You can use the following command on the remote cluster:

echo $HOME

Can you also attempt to manually create the directory on the remote cluster (to rule out an access rights issue) ? Using this command:

mkdir -p $HOME/.scijava-parallel/

Can you provide some information about the remote cluster system you are using ?

The name of the OS: uname

More detailed information on the distribution: lsb_release -a

Which shell is the remote cluster using using ? echo $SHELL

sebherbert commented 3 years ago

Thanks for the new issue and feedback:

$ echo $HOME
/scicore/home/biehlmai/herber0000

mkdir -p $HOME/.scijava-parallel/ runs without issue and creates the folder

$ uname
Linux
$ lsb_release -a
-bash: lsb_release: command not found

Since the previous didn't run and I can't install it, I've run this instead for more details: (let me know if I should run something more)

$ cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
$ echo $SHELL
/bin/bash

Since i could create this .scijava-parallel folder manually, I rerun the jobs after this. No crash anymore but the results in local WD were only composed of 2 .ijm files after download (which is different from the output I saw in your guide video) and the remote WD only contains the same 2 .ijm files plus a JobInfo.ini. So I checked the job dashboard and found this:

Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option PermSize=128m; support was removed in 8.0
Java HotSpot(TM) 64-Bit Server VM warning: Using incremental CMS is deprecated and will likely be removed in a future release
Macro Error: Could not load class cz.it4i.fiji.parallel_macro.ParallelMacro in line 130
        (called from line 26)

ret = call ( "cz.it4i.fiji.parallel_macro.ParallelMacro.initialise" <)> ; 
Macro Error: Could not load class cz.it4i.fiji.parallel_macro.ParallelMacro in line 130
        (called from line 26)

ret = call ( "cz.it4i.fiji.parallel_macro.ParallelMacro.initialise" <)> ; 
Macro Error: Could not load class cz.it4i.fiji.parallel_macro.ParallelMacro in line 130
        (called from line 26)

ret = call ( "cz.it4i.fiji.parallel_macro.ParallelMacro.initialise" <)> ; 

I don't know if it a new issue or a side effect of the previous one.

velissarious commented 3 years ago

It looks like the Parallel Macro is missing from the Fiji in the remote cluster.

Please follow this part of the guide for the remote cluster Fiji configuration. https://github.com/fiji-hpc/parallel-macro/wiki/How-to-install-Parallel-Macro

If you want to also use OpenMPI Ops you can follow the last section if not you may skip it. You can install it later as well if you wish.

sebherbert commented 3 years ago

Thanks, I had indeed missed this part. I'm sorry I am a bit confused in the guide ordering, I thought I was following the right order by running the short guide first as indicated on the home page.

I followed the instruction on the page you provided (OpenMPI Ops included, just to be sure). I now have an additional progress_-1.plog file in my results but that's all. Error message in the Job dashboard / error output has changed to this: jobDashboard_errorOutput.txt

velissarious commented 3 years ago

The Open MPI module used must contain the Java bindings as they are necessary in order to use Open MPI in Java.

HPC Workflow Manager and Parallel Macro are plugins for Fiji (ImageJ distribution), Fiji is written in Java 8 and its plugins have to be as well.

In order to install the Java binding you will have to compile Open MPI configured to enable MPI java.

./configure --enable-mpi-java ...

For more information on Open MPI's java binding as well as installation instructions visit the following link.

For example, in IT4Innovations in the Salomon supercomputer we used the Open MPI 4.0.0 with the java flag enabled during configuration, and compiled with GCC-6.3.0-2.27 and Java 8.

I suggest using the latest GNU compiler and the latest Open MPI version (4.1.1) but make sure to use Java 8 or it will not work with HPC Workflow Manager and Parallel Macro.

The Java bindings are provided by Open MPI itself which is created by Software in the Public Interest SPI non-profit.

Open MPI is open-source, here is the license, it can be download and installed freely.

This article can be useful for testing the installation the Open MPI with the Java bindings.

velissarious commented 2 years ago

Since this issue was posted, newer and now current versions of Parallel Macro no longer require an installation of the Java Bindings. As Parallel Macro now uses JNA instead of JNI to use Open MPI functions.

If there are segmentation faults when running a job, then you may need to set up a custom Open MPI module in your remote HPC cluster home directory as a user of the remote cluster. A detailed guide can be found here: How to Create a Custom Open MPI module.

If the custom Open MPI module is not automatically detected by name, then you need to set this custom module manually in the advanced settings of the paradigm, as this guide describes.