Closed yang5891 closed 1 week ago
Can you share on of the compile commands from the output. It seems that your libraries are not being found during the link phase.
Using mpifort will require "-mkl=cluster" for scalapack and blas support and intel built netcdff library and module file.
Can you share on of the compile commands from the output. It seems that your libraries are not being found during the link phase.
Using mpifort will require "-mkl=cluster" for scalapack and blas support and intel built netcdff library and module file.
Hello teacher. The problems encountered with make have now been reduced, but there are a few libraries that are not recognized.
/public/software/compiler/intel-compiler/2021.3.0/bin/intel64/../../compiler/lib/intel64_lin/for_main.o: In function main': for_main.c:(.text+0x2e): undefined reference to
MAIN__'
obj/current.o: In function current_orbit_': current.f:(.text+0x1b0a): undefined reference to
fftn2_'
obj/rf2x_setup2.o: In function run_rf2x_': rf2x_setup2.f:(.text+0x15c3): undefined reference to
rhograte_'
rf2xsetup2.f:(.text+0x15f0): undefined reference to `rhograte'
rf2xsetup2.f:(.text+0x161d): undefined reference to `rhograte'
rf2xsetup2.f:(.text+0x164a): undefined reference to `rhograte'
rf2xsetup2.f:(.text+0x1677): undefined reference to `rhograte'
obj/rf2x_setup2.o:rf2xsetup2.f:(.text+0x16a4): more undefined references to `rhograte' follow
make: *** [xaorsa2d] Error 1
Those are all routines in the main program. Try
make clean ; make
and post the compile command for aorsa2dmain.o
On 2024-05-30 06:50, yang5891 wrote:
Can you share on of the compile commands from the output. It seems that your libraries are not being found during the link phase.
Using mpifort will require "-mkl=cluster" for scalapack and blas support and intel built netcdff library and module file.
Hello teacher. The problems encountered with make have now been reduced, but there are a few libraries that are not recognized.
/public/software/compiler/intel-compiler/2021.3.0/bin/intel64/../../compiler/lib/intel64_lin/for_main.o: In function main': for_main.c:(.text+0x2e): undefined reference to MAIN__' obj/current.o: In function currentorbit': current.f:(.text+0x1b0a): undefined reference to fftn2_' obj/rf2x_setup2.o: In function runrf2x': rf2xsetup2.f:(.text+0x15c3): undefined reference to rhograte' rf2xsetup2.f:(.text+0x15f0): undefined reference to rhograte' rf2xsetup2.f:(.text+0x161d): undefined reference to rhograte' rf2xsetup2.f:(.text+0x164a): undefined reference to rhograte' rf2xsetup2.f:(.text+0x1677): undefined reference to rhograte' obj/rf2x_setup2.o:rf2xsetup2.f:(.text+0x16a4): more undefined references to `rhograte' follow make: *** [xaorsa2d] Error 1
-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you commented.Message ID: @.***>
-- -john Principal Research Scientist John Wright Office 617-253-9612 zoom: https://mit.zoom.us/my/jcwright
[1] https://github.com/ORNL-Fusion/aorsa/issues/49#issuecomment-2139288625 [2] https://github.com/notifications/unsubscribe-auth/AB7SLTPE3LGRXOMIWPPUBX3ZE4ABFAVCNFSM6AAAAABILFD6TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZZGI4DQNRSGU
Those are all routines in the main program. Try make clean ; make and post the compile command for aorsa2dmain.o … On 2024-05-30 06:50, yang5891 wrote: > Can you share on of the compile commands from the output. It seems > that your libraries are not being found during the link phase. > > Using mpifort will require "-mkl=cluster" for scalapack and blas > support and intel built netcdff library and module file. Hello teacher. The problems encountered with make have now been reduced, but there are a few libraries that are not recognized. /public/software/compiler/intel-compiler/2021.3.0/bin/intel64/../../compiler/lib/intel64_lin/for_main.o: In function main': for_main.c:(.text+0x2e): undefined reference to MAIN__' obj/current.o: In function currentorbit': current.f:(.text+0x1b0a): undefined reference to fftn2_' obj/rf2x_setup2.o: In function runrf2x': rf2xsetup2.f:(.text+0x15c3): undefined reference to rhograte' rf2xsetup2.f:(.text+0x15f0): undefined reference to rhograte' rf2xsetup2.f:(.text+0x161d): undefined reference to rhograte' rf2xsetup2.f:(.text+0x164a): undefined reference to rhograte' rf2xsetup2.f:(.text+0x1677): undefined reference to rhograte' obj/rf2x_setup2.o:rf2xsetup2.f:(.text+0x16a4): more undefined references to `rhograte' follow make: [xaorsa2d] Error 1 -- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you commented.Message ID: @.> -- -john Principal Research Scientist John Wright Office 617-253-9612 zoom: https://mit.zoom.us/my/jcwright Links: ------ [1] #49 (comment) [2] https://github.com/notifications/unsubscribe-auth/AB7SLTPE3LGRXOMIWPPUBX3ZE4ABFAVCNFSM6AAAAABILFD6TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZZGI4DQNRSGU
Thank you teacher, I can now compile and generate xaorsa2d files. Since I don't need pgplot, I commented out the relevant statements. But now I get an error when executing [xaorsa2d.]
`
Your libraries aren’t going runtime. Is your ld_library_path set ?-johnOn May 31, 2024, at 5:01 AM, yang5891 @.***> wrote:
Those are all routines in the main program. Try make clean ; make and post the compile command for aorsa2dmain.o … On 2024-05-30 06:50, yang5891 wrote: > Can you share on of the compile commands from the output. It seems > that your libraries are not being found during the link phase. > > Using mpifort will require "-mkl=cluster" for scalapack and blas > support and intel built netcdff library and module file. Hello teacher. The problems encountered with make have now been reduced, but there are a few libraries that are not recognized. /public/software/compiler/intel-compiler/2021.3.0/bin/intel64/../../compiler/lib/intel64_lin/for_main.o: In function main': for_main.c:(.text+0x2e): undefined reference to MAIN__' obj/current.o: In function currentorbit': current.f:(.text+0x1b0a): undefined reference to fftn2_' obj/rf2x_setup2.o: In function runrf2x': rf2xsetup2.f:(.text+0x15c3): undefined reference to rhograte' rf2xsetup2.f:(.text+0x15f0): undefined reference to rhograte' rf2xsetup2.f:(.text+0x161d): undefined reference to rhograte' rf2xsetup2.f:(.text+0x164a): undefined reference to rhograte' rf2xsetup2.f:(.text+0x1677): undefined reference to rhograte' obj/rf2x_setup2.o:rf2xsetup2.f:(.text+0x16a4): more undefined references to `rhograte' follow make: [xaorsa2d] Error 1 -- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you commented.Message ID: @.> -- -john Principal Research Scientist John Wright Office 617-253-9612 zoom: https://mit.zoom.us/my/jcwright Links: ------ [1] #49 (comment) [2] https://github.com/notifications/unsubscribe-auth/AB7SLTPE3LGRXOMIWPPUBX3ZE4ABFAVCNFSM6AAAAABILFD6TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZZGI4DQNRSGU
Thank you teacher, I can now compile and generate xaorsa2d files. Since I don't need pgplot, I commented out the relevant statements. But now I get an error when executing [xaorsa2d.] `forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source xaorsa2d 00000000008181DA forsignal_handl Unknown Unknown libpthread-2.17.s 00002B448DA74630 Unknown Unknown Unknown libmpi.so.20.10.2 00002B448D49EAC5 MPI_Comm_size Unknown Unknown libmkl_blacs_inte 00002B44864E7A39 MKLMPI_Comm_size Unknown Unknown libmkl_blacs_inte 00002B44864E5D31 mkl_blacs_init Unknown Unknown libmkl_blacs_inte 00002B44864D6898 blacs_pinfo Unknown Unknown xaorsa2d 00000000005FB088 Unknown Unknown Unknown xaorsa2d 0000000000412E92 Unknown Unknown Unknown libc-2.17.so 00002B448DCA3555 __libc_start_main Unknown Unknown xaorsa2d 0000000000412DA9 Unknown Unknown Unknown Primary job terminated normally, but 1 process returned a non-zero exit code.. Per user-direction, the job has been aborted.
mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[55252,1],0] Exit code: 174 `
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Excuse me sir, I am using openmpi's intel-4.0.3 compiler here. The mkl library used is intel-2021.3.0 as shown in the file. it is possible to compile and generate xaorsa2d, but the execution reports an error.
---- Replied Message ---- | From | John C. @.> | | Date | 5/31/2024 21:08 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
Message ID: @.***>
You will have to share the error message -johnOn Jun 2, 2024, at 10:04 AM, yang5891 @.***> wrote: Excuse me sir, I am using openmpi's intel-4.0.3 compiler here. The mkl library used is intel-2021.3.0 as shown in the file. it is possible to compile and generate xaorsa2d, but the execution reports an error.
---- Replied Message ---- | From | John C. @.> | | Date | 5/31/2024 21:08 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Teacher, this is the problem I'm having when running xaorsa2d.
---- Replied Message ---- | From | John C. @.> | | Date | 6/2/2024 22:22 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | You will have to share the error message -johnOn Jun 2, 2024, at 10:04 AM, yang5891 @.***> wrote: Excuse me sir, I am using openmpi's intel-4.0.3 compiler here. The mkl library used is intel-2021.3.0 as shown in the file. it is possible to compile and generate xaorsa2d, but the execution reports an error.
---- Replied Message ---- | From | John C. @.> | | Date | 5/31/2024 21:08 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
What’s your mpirun command? What doesmpirun hostnameReturn?Which test case are you running ? They need one core. -johnOn Jun 2, 2024, at 10:40 AM, yang5891 @.***> wrote: Teacher, this is the problem I'm having when running xaorsa2d.
---- Replied Message ---- | From | John C. @.> | | Date | 6/2/2024 22:22 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | You will have to share the error message -johnOn Jun 2, 2024, at 10:04 AM, yang5891 @.***> wrote: Excuse me sir, I am using openmpi's intel-4.0.3 compiler here. The mkl library used is intel-2021.3.0 as shown in the file. it is possible to compile and generate xaorsa2d, but the execution reports an error.
---- Replied Message ---- | From | John C. @.> | | Date | 5/31/2024 21:08 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
What’s your mpirun command? What doesmpirun hostnameReturn?Which test case are you running ? They need one core. -johnOn Jun 2, 2024, at 10:40 AM, yang5891 @.> wrote: Teacher, this is the problem I'm having when running xaorsa2d. Caught signal 11 (Segmentation fault: address not mapped to object at addres s 0x440000f8) ==== backtrace (tid: 11614) ==== 0 0x000000000006e600 opal_mutex_unlock() /tmp/clussoft.20240516134350/openmpi-4.0.3/ompi/mpi/c/profi le/../../../../opal/threads/mutex_unix.h:158 1 0x000000000006e600 PMPI_Comm_size() /tmp/clussoft.20240516134350/openmpi-4.0.3/ompi/mpi/c/profile/ pcomm_size.c:63 2 0x0000000000029a39 MKLMPI_Comm_size() ???:0 3 0x0000000000027d31 mkl_blacs_init() ???:0 4 0x0000000000018898 blacspinfo() ???:0 5 0x0000000000605248 MAIN() ???:0 6 0x0000000000414b62 main() ???:0 7 0x0000000000022555 libc_start_main() ???:0 8 0x0000000000414a69 _start() ???:0 ================================= forrtl: severe (174): SIGSEGV, segmentation fault occurred Image PC Routine Line Source xaorsa2d 000000000083F74A forsignal_handl Unknown Unknown libpthread-2.17.s 00007F481CAAE630 Unknown Unknown Unknown libmpi.so.40.20.3 00007F481D02B600 MPI_Comm_size Unknown Unknown libmkl_blacs_inte 00007F482403DA39 MKLMPI_Comm_size Unknown Unknown libmkl_blacs_inte 00007F482403BD31 mkl_blacs_init Unknown Unknown libmkl_blacs_inte 00007F482402C898 blacs_pinfo Unknown Unknown xaorsa2d 0000000000605248 Unknown Unknown Unknown xaorsa2d 0000000000414B62 Unknown Unknown Unknown libc-2.17.so 00007F481C6F3555 __libc_start_main Unknown Unknown xaorsa2d 0000000000414A69 Unknown Unknown Unknown -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[55098,1],0] Exit code: 174 -------------------------------------------------------------------------- … ---- Replied Message ---- | From | John C. @.> | | Date | 6/2/2024 22:22 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | You will have to share the error message -johnOn Jun 2, 2024, at 10:04 AM, yang5891 @.> wrote: Excuse me sir, I am using openmpi's intel-4.0.3 compiler here. The mkl library used is intel-2021.3.0 as shown in the file. it is possible to compile and generate xaorsa2d, but the execution reports an error. ---- Replied Message ---- | From | John C. @.> | | Date | 5/31/2024 21:08 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | Message ID: @.> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.> — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.> —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.>
Teacher, I can run the program now with a single core. But when I use multi-core, I get an error. Can this program only be run on a single core?
It seems like you are having issues running with mpi. This error has to do with starting up and executing under mpi run, not with aorsa itself. I suggest you verify you can compile and run a simple mpi program. Assuming you are using intel compile, grab cpi.c from https://gist.github.com/jcwright77/a5e1d66886bc17b0f7936466739cc287
mpiicc cpi.c -o cpi mpirun -np 4 ./cpi
other things,
verify you are using the correct mpirun, 'which mpirun' should show mpirun in the intel distribution
Try making from scratch:
make clean; make
Ask a college who is familiar with parallel programs on your system for help. You issues seem to be outside of aorsa and have to do with basic compilation and execution of parallel programs.
Thank you, teacher, for your patience these days. My brother in my research group helped me find the relevant parameters for calculating the number of cores (nprow x npcol = nproc).But is that okay as long as the product of the two is equal to the number of cores needed for the calculation? For example, if I use quad-core computing, is there a difference between 2x2 and 1x4?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 00:48 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
It seems like you are having issues running with mpi. This error has to do with starting up and executing under mpi run, not with aorsa itself. I suggest you verify you can compile and run a simple mpi program. Assuming you are using intel compile, grab cpi.c from https://gist.github.com/jcwright77/a5e1d66886bc17b0f7936466739cc287
mpiicc cpi.c -o cpi mpirun -np 4 ./cpi
other things, verify you are using the correct mpirun, 'which mpirun' should show mpirun in the intel distribution Try making from scratch: make clean; make
Ask a college who is familiar with parallel programs on your system for help. You issues seem to be outside of aorsa and have to do with basic compilation and execution of parallel programs.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
It affects the decomposition of the matrix in the code. 2 x 2 is better than one by four. For the test cases you should not need to change anything. They just use one by one.-johnOn Jun 4, 2024, at 7:52 AM, yang5891 @.***> wrote: Thank you, teacher, for your patience these days. My brother in my research group helped me find the relevant parameters for calculating the number of cores (nprow x npcol = nproc).But is that okay as long as the product of the two is equal to the number of cores needed for the calculation? For example, if I use quad-core computing, is there a difference between 2x2 and 1x4?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 00:48 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
It seems like you are having issues running with mpi. This error has to do with starting up and executing under mpi run, not with aorsa itself. I suggest you verify you can compile and run a simple mpi program. Assuming you are using intel compile, grab cpi.c from https://gist.github.com/jcwright77/a5e1d66886bc17b0f7936466739cc287
mpiicc cpi.c -o cpi mpirun -np 4 ./cpi
other things, verify you are using the correct mpirun, 'which mpirun' should show mpirun in the intel distribution Try making from scratch: make clean; make
Ask a college who is familiar with parallel programs on your system for help. You issues seem to be outside of aorsa and have to do with basic compilation and execution of parallel programs.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Thank you, teacher, but the number of cores in our group is 32, and I need to set more points and consider different ions and concentrations in the future. How should I set these two parameters?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 19:54 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | It affects the decomposition of the matrix in the code. 2 x 2 is better than one by four. For the test cases you should not need to change anything. They just use one by one.-johnOn Jun 4, 2024, at 7:52 AM, yang5891 @.***> wrote: Thank you, teacher, for your patience these days. My brother in my research group helped me find the relevant parameters for calculating the number of cores (nprow x npcol = nproc).But is that okay as long as the product of the two is equal to the number of cores needed for the calculation? For example, if I use quad-core computing, is there a difference between 2x2 and 1x4?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 00:48 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
It seems like you are having issues running with mpi. This error has to do with starting up and executing under mpi run, not with aorsa itself. I suggest you verify you can compile and run a simple mpi program. Assuming you are using intel compile, grab cpi.c from https://gist.github.com/jcwright77/a5e1d66886bc17b0f7936466739cc287
mpiicc cpi.c -o cpi mpirun -np 4 ./cpi
other things, verify you are using the correct mpirun, 'which mpirun' should show mpirun in the intel distribution Try making from scratch: make clean; make
Ask a college who is familiar with parallel programs on your system for help. You issues seem to be outside of aorsa and have to do with basic compilation and execution of parallel programs.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
First you need to verify the test case works so far you show me that you get errors for that-johnOn Jun 4, 2024, at 8:00 AM, yang5891 @.***> wrote: Thank you, teacher, but the number of cores in our group is 32, and I need to set more points and consider different ions and concentrations in the future. How should I set these two parameters?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 19:54 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | It affects the decomposition of the matrix in the code. 2 x 2 is better than one by four. For the test cases you should not need to change anything. They just use one by one.-johnOn Jun 4, 2024, at 7:52 AM, yang5891 @.***> wrote: Thank you, teacher, for your patience these days. My brother in my research group helped me find the relevant parameters for calculating the number of cores (nprow x npcol = nproc).But is that okay as long as the product of the two is equal to the number of cores needed for the calculation? For example, if I use quad-core computing, is there a difference between 2x2 and 1x4?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 00:48 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
It seems like you are having issues running with mpi. This error has to do with starting up and executing under mpi run, not with aorsa itself. I suggest you verify you can compile and run a simple mpi program. Assuming you are using intel compile, grab cpi.c from https://gist.github.com/jcwright77/a5e1d66886bc17b0f7936466739cc287
mpiicc cpi.c -o cpi mpirun -np 4 ./cpi
other things, verify you are using the correct mpirun, 'which mpirun' should show mpirun in the intel distribution Try making from scratch: make clean; make
Ask a college who is familiar with parallel programs on your system for help. You issues seem to be outside of aorsa and have to do with basic compilation and execution of parallel programs.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
Yes teacher. I can now run every example normally. For example, the final output of DIIID-helion is as follows(Here I used nprow=4, npcol=8)
time to do plots = 0.039 min 1 total cpu time used = 0.097 min
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 21:16 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | First you need to verify the test case works so far you show me that you get errors for that-johnOn Jun 4, 2024, at 8:00 AM, yang5891 @.***> wrote: Thank you, teacher, but the number of cores in our group is 32, and I need to set more points and consider different ions and concentrations in the future. How should I set these two parameters?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 19:54 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | It affects the decomposition of the matrix in the code. 2 x 2 is better than one by four. For the test cases you should not need to change anything. They just use one by one.-johnOn Jun 4, 2024, at 7:52 AM, yang5891 @.***> wrote: Thank you, teacher, for your patience these days. My brother in my research group helped me find the relevant parameters for calculating the number of cores (nprow x npcol = nproc).But is that okay as long as the product of the two is equal to the number of cores needed for the calculation? For example, if I use quad-core computing, is there a difference between 2x2 and 1x4?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 00:48 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
It seems like you are having issues running with mpi. This error has to do with starting up and executing under mpi run, not with aorsa itself. I suggest you verify you can compile and run a simple mpi program. Assuming you are using intel compile, grab cpi.c from https://gist.github.com/jcwright77/a5e1d66886bc17b0f7936466739cc287
mpiicc cpi.c -o cpi mpirun -np 4 ./cpi
other things, verify you are using the correct mpirun, 'which mpirun' should show mpirun in the intel distribution Try making from scratch: make clean; make
Ask a college who is familiar with parallel programs on your system for help. You issues seem to be outside of aorsa and have to do with basic compilation and execution of parallel programs.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
you can look at the namelist md file and the tests for guidance. For problem size set nprow=npcol=8 or greater. If you only have 32 cores, use 4,4
ideally nmodesx=nmodesy=128 but that requires several nodes and significant meory. You might try 32,32 but the case will be severely under resolved depending on the scales involved.
good luck
On 2024-06-04 09:43, yang5891 wrote:
Yes teacher. I can now run every example normally. For example, the final output of DIIID-helion is as follows(Here I used nprow=4, npcol=8)
time to do plots = 0.039 min 1 total cpu time used = 0.097 min
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 21:16 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | First you need to verify the test case works so far you show me that you get errors for that-johnOn Jun 4, 2024, at 8:00 AM, yang5891 @.***> wrote: Thank you, teacher, but the number of cores in our group is 32, and I need to set more points and consider different ions and concentrations in the future. How should I set these two parameters?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 19:54 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | It affects the decomposition of the matrix in the code. 2 x 2 is better than one by four. For the test cases you should not need to change anything. They just use one by one.-johnOn Jun 4, 2024, at 7:52 AM, yang5891 @.***> wrote: Thank you, teacher, for your patience these days. My brother in my research group helped me find the relevant parameters for calculating the number of cores (nprow x npcol = nproc).But is that okay as long as the product of the two is equal to the number of cores needed for the calculation? For example, if I use quad-core computing, is there a difference between 2x2 and 1x4?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 00:48 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
It seems like you are having issues running with mpi. This error has to do with starting up and executing under mpi run, not with aorsa itself. I suggest you verify you can compile and run a simple mpi program. Assuming you are using intel compile, grab cpi.c from https://gist.github.com/jcwright77/a5e1d66886bc17b0f7936466739cc287
mpiicc cpi.c -o cpi mpirun -np 4 ./cpi
other things, verify you are using the correct mpirun, 'which mpirun' should show mpirun in the intel distribution Try making from scratch: make clean; make
Ask a college who is familiar with parallel programs on your system for help. You issues seem to be outside of aorsa and have to do with basic compilation and execution of parallel programs.
-- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
--Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
-- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
--Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
-- Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
-- Reply to this email directly, view it on GitHub [1], or unsubscribe [2]. You are receiving this because you commented.Message ID: @.***>
-- -john Principal Research Scientist John Wright Office 617-253-9612 zoom: https://mit.zoom.us/my/jcwright
[1] https://github.com/ORNL-Fusion/aorsa/issues/49#issuecomment-2147574133 [2] https://github.com/notifications/unsubscribe-auth/AB7SLTL4QXX2DP5W5HN3KEDZFXAAXAVCNFSM6AAAAABILFD6TWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNBXGU3TIMJTGM --=_bdd64717b3167e2d2780a239b48ebf89 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=UTF-8
you can look at the namelist md file and the tests for guidance. For pro= blem size set nprow=3Dnpcol=3D8 or greater. If you only have 32 cores, use = 4,4
ideally nmodesx=3Dnmodesy=3D128 but that requires several nodes and sign= ificant meory. You might try 32,32 but the case will be severely under reso= lved depending on the scales involved.
good luck
On 2024-06-04 09:43, yang5891 wrote:
Yes teacher. I can now run every example normally. For example, the final o= utput of DIIID-helion is as follows=EF=BC=88Here I used nprow=3D4, npcol=3D= 8=EF=BC=89
time to do plots =3D 0.039 min
1
total cpu time used =3D 0.097 min
----= Replied Message ----
| From | John C. ***@***.***> |
| Date |= 6/4/2024 21:16 |
| To | ***@***.***> |
| Cc | ***@***.***>= ,
***@***.***> |
| Subject | Re: [ORNL-Fusion/aorsa] How to ru= n properly on Centos server (Issue #49) |
First you need to verify the= test case works so far you show me that you get errors for that-johnOn Jun= 4, 2024, at 8:00=E2=80=AFAM, yang5891 ***@***.***> wrote:
Thank yo= u, teacher, but the number of cores in our group is 32, and I need to set m= ore points and consider different ions and concentrations in the future. Ho= w should I set these two parameters?
---- Replied Message ----
| From | John C. ***@***.***> |
|= Date | 6/4/2024 19:54 |
| To | ***@***.***> |
| Cc | ***@***.= ***>,
***@***.***> |
| Subject | Re: [ORNL-Fusion/aorsa] Ho= w to run properly on Centos server (Issue #49) |
It affects the decomp= osition of the matrix in the code. 2 x 2 is better than one by four. For th= e test cases you should not need to change anything. They just use one by o= ne.-johnOn Jun 4, 2024, at 7:52=E2=80=AFAM, yang5891 ***@***.***> wrote:=
Thank you, teacher, for your patience these days. My brother in my re= search group helped me find the relevant parameters for calculating the num= ber of cores (nprow x npcol =3D nproc).But is that okay as long as the prod= uct of the two is equal to the number of cores needed for the calculation? = For example, if I use quad-core computing, is there a difference between 2x= 2 and 1x4?
---- Replied Message --= --
| From | John C. ***@***.***> |
| Date | 6/4/2024 00:48 || To | ***@***.***> |
| Cc | ***@***.***>,
***@***.***&= gt; |
| Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Cento= s server (Issue #49) |
It seems like you are having issues runni= ng with mpi. This error has to do with starting up and executing under mpi = run, not with aorsa itself. I suggest you verify you can compile and run a = simple mpi program. Assuming you are using intel compile, grab cpi.c from h= ttps://gist.github.com/jcwright77/a5e1d66886bc17b0f7936466739cc287
mpiicc cpi.c -o cpi
mpirun -np 4 ./cpi
other things,
verify you are using the correct mpirun, 'which mpirun' should show mpiru= n in the intel distribution
Try making from scratch:
make clean; = make
Ask a college who is familiar with parallel programs on you= r system for help. You issues seem to be outside of aorsa and have to do wi= th basic compilation and execution of parallel programs.
—=
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you authored the thread.Message ID: ***@***= =2E***>
—Reply to this email directly, view it on GitHu= b, or unsubscribe.You are receiving this because you commented.Message ID: = ***@***.***>
—
Reply to this email directly, view = it on GitHub, or unsubscribe.
You are receiving this because you autho= red the thread.Message ID: ***@***.***>
—Reply to this = email directly, view it on GitHub, or unsubscribe.You are receiving this be= cause you commented.Message ID: ***@***.***>
—
Rep= ly to this email directly, view it on GitHub, or unsubscribe.
You are = receiving this because you authored the thread.Message ID: ***@***.***>—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are rece= iving this because you commented.Message ID: <ORNL-Fusi= on/aorsa/issues/49/2147574133@github= =2Ecom>
--=_bdd64717b3167e2d2780a239b48ebf89--
Hi teacher, I'm having a small problem modifying the arithmetic example. I am now modifying it to a ratio of 1:1 for D and T and a concentration of 0.1% for the third impurity Li. However, I have calculated with TORIC that the absorption of Li can reach about 80%, but with AORSA the electron absorption is negative (-3.5018 %) and the absorption of Li is only 2.1831 %.
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 21:16 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | First you need to verify the test case works so far you show me that you get errors for that-johnOn Jun 4, 2024, at 8:00 AM, yang5891 @.***> wrote: Thank you, teacher, but the number of cores in our group is 32, and I need to set more points and consider different ions and concentrations in the future. How should I set these two parameters?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 19:54 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) | It affects the decomposition of the matrix in the code. 2 x 2 is better than one by four. For the test cases you should not need to change anything. They just use one by one.-johnOn Jun 4, 2024, at 7:52 AM, yang5891 @.***> wrote: Thank you, teacher, for your patience these days. My brother in my research group helped me find the relevant parameters for calculating the number of cores (nprow x npcol = nproc).But is that okay as long as the product of the two is equal to the number of cores needed for the calculation? For example, if I use quad-core computing, is there a difference between 2x2 and 1x4?
---- Replied Message ---- | From | John C. @.> | | Date | 6/4/2024 00:48 | | To | @.> | | Cc | @.>, @.> | | Subject | Re: [ORNL-Fusion/aorsa] How to run properly on Centos server (Issue #49) |
It seems like you are having issues running with mpi. This error has to do with starting up and executing under mpi run, not with aorsa itself. I suggest you verify you can compile and run a simple mpi program. Assuming you are using intel compile, grab cpi.c from https://gist.github.com/jcwright77/a5e1d66886bc17b0f7936466739cc287
mpiicc cpi.c -o cpi mpirun -np 4 ./cpi
other things, verify you are using the correct mpirun, 'which mpirun' should show mpirun in the intel distribution Try making from scratch: make clean; make
Ask a college who is familiar with parallel programs on your system for help. You issues seem to be outside of aorsa and have to do with basic compilation and execution of parallel programs.
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you commented.Message ID: @.***>
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>
Hello, authors.I need to run this software on centos, but I'm having some problems with it and I'm asking you guys for advice.
Here I made some changes to the makefile. `ifeq ($(shell cat /etc/os-release | grep -qEi 'centos'; echo $$?),0) # Centos
ifeq ($(UNAME_R),18.7.0)
endif
endif
corresponding makeopts.centos is
include compileropts.gnuFC = mpif90
pgplot
LIBS += -L/usr/lib/ -lpgplot
netcdf
NETCDF_DIR = /public/home/zhangzl/software/netcdf4-needed/lib\ LIBS += $(NETCDF_DIR)/libnetcdff.a -L $(NETCDF_DIR) -lnetcdf INCLUDE_DIRS += -I /public/home/zhangzl/software/netcdf4-needed/include
scalapack
LIBS += -L /public/home/zhangzl/software/scalapack-2.2.0 \ /public/home/zhangzl/software/BLAS-3.12.0/lib/libblas.a \ /public/home/zhangzl/software/BLACS/LIB/blacs.a \ /public/home/zhangzl/software/BLACS/LIB/blacsF77init_MPI-LINUX-0.a \ /public/home/zhangzl/software/BLACS/LIB/blacsCinit_MPI-LINUX-0.a \ /public/home/zhangzl/software/BLACS/LIB/blacsF77.a \ /public/home/zhangzl/software/BLACS/LIB/blacs_MPI-LINUX-0.a `
The compileropts.centos is
COMMON_OPTION = -save -r8 #-i8 COMMON_OPTION2 = -r8 #-i8 COMMON_OPTION3 = COMMON_OPTION4 = -r8 #-i4 MOD_DIR_FLAG = -module $(MOD_DIR)
Now when I do the make operation, I run into some problems ①
(.text+0x20): undefined reference to
main' obj/ql_myra.o: In function__ql_myra_mod_MOD_ql_myra_write': ql_myra.f:(.text+0x11744): undefined reference to
blacsbarrier' obj/wdot_test.o: In function__wdot_mod_MOD_wdot_new_maxwellian': wdot_test.f90:(.text+0xa6c0): undefined reference to
blacsbarrier' obj/current.o: In functioncurrent_orbit_': current.f:(.text+0x17c4f): undefined reference to
fftn2_' current.f:(.text+0x245c2): undefined reference toblacs_barrier_' obj/current.o: In function
ntilda': current.f:(.text+0x2b22e): undefined reference toblacs_barrier_' obj/current.o: In function
current_': current.f:(.text+0x33792): undefined reference toblacs_barrier_' obj/current.o: In function
current1': current.f:(.text+0x3d416): undefined reference toblacs_barrier_' obj/current.o: In function
current2': current.f:(.text+0x459f2): undefined reference toblacs_barrier_' obj/current.o:current.f:(.text+0x49f11): more undefined references to
blacsbar rier' follow obj/setupblacs.o: In functionsetupblacs_': setupblacs.f:(.text+0x5f): undefined reference to
blacspinfo' setupblacs.f:(.text+0x8a): undefined reference toblacs_setup_' setupblacs.f:(.text+0xa5): undefined reference to
blacsget' setupblacs.f:(.text+0xf9): undefined reference toblacs_gridinit_' setupblacs.f:(.text+0x38d): undefined reference to
blacsgridexit' setupblacs.f:(.text+0x4fd): undefined reference toblacs_get_' setupblacs.f:(.text+0x521): undefined reference to
blacsgridinit' setupblacs.f:(.text+0x545): undefined reference toblacs_gridinfo_' obj/rf2x_setup2.o: In function
runrf2x': rf2xsetup2.f:(.text+0x3f25): undefined reference to `rhograte' rf2xsetup2.f:(.text+0x3f6a): undefined reference to `rhograte' rf2xsetup2.f:(.text+0x3faf): undefined reference to `rhograte' rf2xsetup2.f:(.text+0x3ff4): undefined reference to `rhograte' rf2xsetup2.f:(.text+0x4039): undefined reference to `rhograte' obj/rf2x_setup2.o:rf2xsetup2.f:(.text+0x407e): more undefined references to `rh ograte' follow obj/read_cql3d.o: In function__read_cql3d_MOD_netcdfr3d': read_cql3d.f90:(.text+0x205): undefined reference to
netcdf_MOD_nf90_open' read_cql3d.f90:(.text+0x347): undefined reference to `netcdf_MOD_nf90_inq_dimi d' read_cql3d.f90:(.text+0x411): undefined reference to `netcdf_MOD_nf90_inq_dimi d' read_cql3d.f90:(.text+0x4db): undefined reference to__netcdf_MOD_nf90_inq_dimi d' read_cql3d.f90:(.text+0x5a5): undefined reference to
netcdf_MOD_nf90_inq_dimi d' read_cql3d.f90:(.text+0x66f): undefined reference to__netcdf_MOD_nf90_inq_dimi d' read_cql3d.f90:(.text+0x73f): undefined reference to
netcdf_MOD_nf90inquire dimension' readcql3d.f90:(.text+0x81a): undefined reference to `ncdinq' readcql3d.f90:(.text+0x844): undefined reference to `ncdinq' readcql3d.f90:(.text+0x86e): undefined reference to `ncdinq' readcql3d.f90:(.text+0x898): undefined reference to `ncdinq' read_cql3d.f90:(.text+0x19cc): undefined reference to__netcdf_MOD_nf90_inq_var id' read_cql3d.f90:(.text+0x19eb): undefined reference to
netcdf_MOD_nf90_get_var _eightbytereal' read_cql3d.f90:(.text+0x1a83): undefined reference to `netcdf_MOD_nf90_inq_var id' read_cql3d.f90:(.text+0x1ab6): undefined reference to__netcdf_MOD_nf90_get_var _1d_eightbytereal' read_cql3d.f90:(.text+0x1ad5): undefined reference to
netcdf_MOD_nf90_inq_var id' read_cql3d.f90:(.text+0x1b08): undefined reference to__netcdf_MOD_nf90_get_var _1d_fourbyteint' read_cql3d.f90:(.text+0x1b3f): undefined reference to
netcdf_MOD_nf90_inq_var id' read_cql3d.f90:(.text+0x1b72): undefined reference to__netcdf_MOD_nf90_get_var _2d_eightbytereal' read_cql3d.f90:(.text+0x1b91): undefined reference to
netcdf_MOD_nf90_inq_var id' read_cql3d.f90:(.text+0x1bc4): undefined reference to__netcdf_MOD_nf90_get_var _1d_eightbytereal' read_cql3d.f90:(.text+0x1be3): undefined reference to
netcdf_MOD_nf90_inq_var id' read_cql3d.f90:(.text+0x1c16): undefined reference to__netcdf_MOD_nf90_get_var _3d_eightbytereal' read_cql3d.f90:(.text+0x1d76): undefined reference to
netcdf_MOD_nf90_inq_var id' read_cql3d.f90:(.text+0x1da9): undefined reference to__netcdf_MOD_nf90_get_var _2d_eightbytereal' read_cql3d.f90:(.text+0x215b): undefined reference to
netcdf_MOD_nf90_inq_var id' read_cql3d.f90:(.text+0x218e): undefined reference to__netcdf_MOD_nf90_get_var _2d_eightbytereal' read_cql3d.f90:(.text+0x23f0): undefined reference to
__netcdf_MOD_nf90_close' collect2: error: ld returned 1 exit status make: *** [xaorsa2d] Error 1 `② If I compile with mpifort, the error is reported as
src/CQL3D_SETUP/read_cql3d.f90(35): error #7013: This module file was not generated by anyrelease of this compiler.[NETCDF] use netcdf ------∧ src/COL3D_SETUP/read cql3d.f90(432):internal error:Please visit 'http://www.intel.com/sotware/products/support' for assistance. if ( iret .ne. NF90_NOERR) then [Aborting due to internal error. ] compilation aborted for src/CQL3D_SETUP/read cql3d.f90(code 1) make: ***[obi/read cql3d.o]Error 1