problem to compile the CLARA2 program

QJohn2017 commented 6 years ago

Hello, I'm trying to calculate the Thomson and Compton scatter, so I have find the program CLARA2 in github. But when I compile this program in supercomputer center by the tutorial in the website , some problems occurred as following:

siom003@login3 ~/qzy/CLARA2/clara2-0.1.0/src> ./prepare_job.sh
for MPI enter 1
for PBS-Array jobs enter 2
1
MPI choosen
clara2_hypnos.modules: line 1: /etc/profile.modules: No such file or directory
ModuleCmd_Load.c(204):ERROR:105: Unable to locate a modulefile for 'gcc/4.6.2'
ModuleCmd_Load.c(204):ERROR:105: Unable to locate a modulefile for 'infiniband/1                                                                                                             .0.0'
ModuleCmd_Load.c(204):ERROR:105: Unable to locate a modulefile for 'openmpi/1.6.                                                                                                             0'
ModuleCmd_Load.c(204):ERROR:105: Unable to locate a modulefile for 'fftw/3.3.4'
ModuleCmd_Load.c(204):ERROR:105: Unable to locate a modulefile for 'editor/emacs                                                                                                             '
make -C ./include/
make[1]: Entering directory `/public/home/users/siom003/qzy/CLARA2/clara2-0.1.0/                                                                                                             src/include'
g++ -Wall -O3 -c detector_e_field.cpp
g++ -Wall -O3 -c detector_dft.cpp
g++ -Wall -O3 -c -lfftw3 -lm detector_fft.cpp
In file included from detector_fft.cpp:27:
ned_fft.hpp: In member function â€˜void ned_FFT<A, T>::fft(T*, long unsigned int                                                                                                             )â€™:
ned_fft.hpp:103: error: â€˜inputâ€™ was not declared in this scope
ned_fft.hpp:104: error: â€˜outputâ€™ was not declared in this scope
ned_fft.hpp: In member function â€˜void ned_FFT<A, T>::fft(T*, long unsigned int                                                                                                             ) [with A = double, T = Vector<double, 3u>]â€™:
ned_fft.hpp:71:   instantiated from â€˜ned_FFT<A, T>::ned_FFT(unsigned int, A*,                                                                                                              T*) [with A = double, T = Vector<double, 3u>]â€™
detector_fft.cpp:155:   instantiated from here
ned_fft.hpp:102: warning: unused variable â€˜inâ€™
ned_fft.hpp:102: warning: unused variable â€˜outâ€™
make[1]: *** [detector_fft.o] Error 1
make[1]: Leaving directory `/public/home/users/siom003/qzy/CLARA2/clara2-0.1.0/s                                                                                                             rc/include'
make: *** [subsystem] Error 2
mpic++ -Wall -O3 -lfftw3 -lm -D__PARALLEL_SETTING__=1 -c -fopenmp -lz main.cpp
icpc: warning #10315: specifying -lm before files may supersede the Intel(R) mat                                                                                                             h library and affect performance
g++ -Wall -O3 -lfftw3 -lm -c -fopenmp -lz -I./include/ all_directions.cpp
make: *** No rule to make target `include/libDetector.a', needed by `MPI'.  Stop                                                                                                             .
siom003@login3 ~/qzy/CLARA2/clara2-0.1.0/src> cd include/
siom003@login3 ~/qzy/CLARA2/clara2-0.1.0/src/include> vi ned_fft.hpp
siom003@login3 ~/qzy/CLARA2/clara2-0.1.0/src/include> cd ..
siom003@login3 ~/qzy/CLARA2/clara2-0.1.0/src> make clean
rm -f *o
rm executable
rm: cannot remove `executable': No such file or directory
make: *** [clean] Error 1

Can you help me to solve this problem? Thank you very much!

PrometheusPi commented 6 years ago

Hi @QJohn2017,

thank you for posting your question.
The tutorial assumes you are running Clara2 on thy hypnos cluster at HZDR. The /prepare_job.sh script you execute loads various modules by sourcing clara2_hypnos.modules. The compilers and libraries loaded therein are the ones provided by the hypnos cluster admins. Due to the errors, I assume you are not running Clara2 on hypnos, is this correct?

If this is the case, we just have to adjust the modules loaded accordingly and Clara2 should compile. So the question is: does the cluster you run on has a module system which allows you to load:

gcc (seems to be installed)
fftw (was not available at compile time)
openmpi (probably available, but not certain based on the error message)

If this is the case, just load these modules and run make MPI. It the cluster does not provide a module system, you have to install these libraries on your own (or use the installed ones).

If the compilation works fine, we can further discuss how to submit an MPI job on your cluster.

I hope this answer helped you. I am looking forward to your reply.

Just out of curiosity: at what cluster do you plan to run Clara2?

Best, @PrometheusPi

schluenz commented 6 years ago

Hi,

just ran into the same issue. plain make fails on undeclared input/output in ned_fft.hpp. I guess

[src]$ diff include/ned_fft.hpp~ include/ned_fft.hpp
102c102
<     fftw_complex *in, *out;
---
>     fftw_complex *input, *output;

would fix it. Cheers, Frank.

PrometheusPi commented 6 years ago

Hi @schluenz,

thanks for posting. Good point - this must have slipped in despite my checks when uploading. (I probably did not remove the library file created when checking.) Please excuse this inconvenience.

I will provide a fix in a minute.

PrometheusPi commented 6 years ago

Your issue should be fixed with #90 If you checkout the current dev the fix is already included. The master will be updated soon.

PrometheusPi commented 6 years ago

@QJohn2017 Does this fix your problem?

QJohn2017 commented 6 years ago

Hi @PrometheusPi , I have tried the revised version that you uploaded 3 days ago. In addition, the gcc version in my supercomputer center is gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4). The mpic++ version is icpc (ICC) 11.1 20091012. However, I did not find the installation of the fftw , but I copy the_**fftw.h**_ toclara2/src/include/ at compiled code at, and change the first column code in the file _**ned_fft.hpp**_ into # include "fftw3.h". Then I input the command "make MPI", the errors are as follows:

[ac_siom_jsliu_1@ln0%tianhe2-C src]$ make MPI
make -C ./include/
make[1]: Entering directory `/WORK/ac_siom_jsliu_1/qzy/CLARA2/clara2-dev/src/include'
g++ -Wall -O3 -c detector_e_field.cpp
g++ -Wall -O3 -c detector_dft.cpp
g++ -Wall -O3 -c -lfftw3 -lm detector_fft.cpp
rm -f libDetector.a
ar cr libDetector.a detector_e_field.o detector_dft.o detector_fft.o
g++ -Wall -O3 -c fileExists.cpp
make[1]: Leaving directory `/WORK/ac_siom_jsliu_1/qzy/CLARA2/clara2-dev/src/include'
mpic++ -Wall -O3 -lfftw3 -lm -D__PARALLEL_SETTING__=1 -c -fopenmp -lz main.cpp
icpc: warning #10314: specifying -lm before object files may supercede the Intel(R) math library and affect performance
icpc: command line warning #10006: ignoring unknown option '-fopenmp'
parallel_jobs.h(22): remark #1782: #pragma once is obsolete. Use #ifndef guard instead.
  #pragma once
              ^

parallel_jobs.h(54): remark #1418: external function definition with no prior declaration
  int start_array(int* numtasks,
      ^

parallel_jobs.h(107): remark #1418: external function definition with no prior declaration
  int end_array(void)
      ^

parallel_jobs.h(130): remark #1418: external function definition with no prior declaration
  int check_break(void)
      ^

all_directions.hpp(21): remark #1782: #pragma once is obsolete. Use #ifndef guard instead.
  #pragma once
              ^

main.cpp(71): remark #181: argument is incompatible with corresponding format string conversion
      printf("this is job %5d of %5d jobs in the array (on %s = rank: %d)\n", i, N_max, pHost, rank);
                                                                              ^

main.cpp(71): remark #181: argument is incompatible with corresponding format string conversion
      printf("this is job %5d of %5d jobs in the array (on %s = rank: %d)\n", i, N_max, pHost, rank);
                                                                                                     ^

g++ -Wall -O3 -lfftw3 -lm -c -fopenmp -lz -I./include/ all_directions.cpp
g++ -Wall -O3 -lfftw3 -lm -c  -fopenmp -I./include/ single_direction.cpp
In file included from single_direction.cpp:35:
run_through_data.hpp: In function â€˜void run_through_data(const one_line*, unsigned int, DET) [with DET = Detector_fft*]â€™:
run_through_data.hpp:56: warning: â€˜time_fill.Discrete<double>::futureâ€™ may be used uninitialized in this function
run_through_data.hpp:56: warning: â€˜time_fill.Discrete<double>::nowâ€™ may be used uninitialized in this function
run_through_data.hpp:56: warning: â€˜time_fill.Discrete<double>::oldâ€™ may be used uninitialized in this function
mpic++ -Wall -O3 -lfftw3 -lm -fopenmp -lz main.o all_directions.o single_direction.o ./include/libDetector.a ./include/fileExists.o -o executable
icpc: warning #10314: specifying -lm before object files may supercede the Intel (R) math library and affect performance
icpc: command line warning #10006: ignoring unknown option '-fopenmp'
[ac_siom_jsliu_1@ln0%tianhe2-C src]$ ^C
[ac_siom_jsliu_1@ln0%tianhe2-C src]$

So I don't know how to solve this question, can you give some suggestions? Thank you very much!

QJohn2017 commented 6 years ago

Hi @PrometheusPi , Sorry, i missed some word in the above comments, I copied the "fftw3.h" file from the compiled code at: to the "_**clara2/src/include/**_"

QJohn2017 commented 6 years ago

Sorry @PrometheusPi ,i don't know why the website doesn't display here

PrometheusPi commented 6 years ago

Hi @QJohn2017, thanks for updating the code. The new code seems to have solved the initial issue you encountered. The FFT based Fourier transform seems to compile with fftw (g++ -Wall -O3 -c -lfftw3 -lm detector_fft.cpp was successful). I assume you added the fftw library to you LD_LIBRARY_PATH (or the installer did this for you). So copying the fftw3.h should not be necessary.

Right now, you encounter various warnings:

The warning on using the math library -lm is Intel compiler specific. If at all, it is just a slight performance issue when not using the compiler specific optimized library.
The icpc compiler seems not the be openMP aware. This would prevent using openMP parallelization, which is supported but right now not active. (so no need to worry about this warning).
Another warning is caused by the unknown #pragma once. This might cause compile errors, but since icpc is used only in the final step, everything appears to be fine. However this might cause errors in the future. Updating your icp to a newer version or using another (supported) compiler might help.
The final warning originates from run_through_data.hpp line 56. The variable is definitely not initialized, but is used correctly in the code.

These are all just warnings and do not preventing compilation. So in the end mpic++ -Wall -O3 -lfftw3 -lm -fopenmp -lz main.o all_directions.o single_direction.o ./include/libDetector.a ./include/fileExists.o -o executable appears to be successful and an executable/program with the name executable should have been created.

This is a working Clara2 program. Please be aware that is will try to analyze the files you specified in settings.hpp and will further also use the directions and frequencies specified there.

Was there a file called executable created?

If I read the file you provided correctly, you are running Clara2 on tianhe2. If so, the PBS based script generated by ./prepare_job.sh will not work for your system, since tianhe2 uses slurm instead of PBS. However, I have some experice with other slurm systems. So I could provide a ./prepare_job_tianhe2.sh script, that generates a submit script for slurm.

Are you running on a SLURM system?

QJohn2017 commented 6 years ago

Hi, @PrometheusPi , Thank you very much that you can help me to analyze the compiled errors. Yes, my supercomputer center is tianhe2, how do you know? Besides, when I complied the program, there has a executable file named executable and its size is 1941 kb, is this executable right? Furthermore, if i want to eliminate the warnings and other errors when compiling, how can i do it ? Do I just replace the compiler icpc to OpenMpi ? Last, can you provide the ./prepare_job_tianhe2.sh script for me ? Thank you very much !

PrometheusPi commented 6 years ago

@QJohn2017 Great to hear that the executable was produced. The actual size depends on the compiler and the libraries used. On our local cluster, the size is for example smaller than 256K. But don't worry, this is probably just the compiler.

The output of your second message contained on the first line the node name you were running at ln0%tianhe2-C- since I am familiar with the top HPC clusters in the world due to our main code PIConGPU, I recognized the name.

I opened two new issues regarding the submit script for tianhe2 #93 and the warnings in the icpc #94 to keep the discussion "clean" in separate issues. I posted some questions there for you. Feel free to answer them there.

The update seems to have resolved the compile bug, thus I will close this issue.

ComputationalRadiationPhysics / clara2

problem to compile the CLARA2 program #89