SPECFEM / specfem2d

SPECFEM2D simulates forward and adjoint seismic wave propagation in two-dimensional acoustic, (an)elastic, poroelastic or coupled acoustic-(an)elastic-poroelastic media, with Convolution PML absorbing conditions.
https://specfem.org
GNU General Public License v3.0
203 stars 147 forks source link

Failed job while using mpi parallel #1170

Open rashi-13 opened 1 year ago

rashi-13 commented 1 year ago

When changing the "NUMBER_OF_SIMULTANEOUS_RUNS = 3" (in the Par_file) and running using mpi with 3 processes, the job was failing with an error message: "must not have IMAIN == ISTANDARD_OUTPUT when NUMBER_OF_SIMULTANEOUS_RUNS > 1 otherwise output to screen is mingled. Change this in specfem/setup/constant.h.in and recompile. Error detected, aborting MPI... proc 0"

This was performed using an HPC with 3 nodes. I want to generate the results for 39 source positions simultaneously. Kindly help me figure this out.

homnath commented 1 year ago

It seems that IMAIN is currently set to ISTANDARD_OUTPUT. You can change that by modifying the file setup/constants.h.in

! uncomment this to write to standard output (i.e. to the screen) integer, parameter :: IMAIN = ISTANDARD_OUTPUT ! uncomment this to write messages to a text file ! integer, parameter :: IMAIN = 42

to

! uncomment this to write to standard output (i.e. to the screen) ! integer, parameter :: IMAIN = ISTANDARD_OUTPUT ! uncomment this to write messages to a text file integer, parameter :: IMAIN = 42

Do not forget to configure and compile after changing the constants.h.in file.

Best, Hom Nath


From: name_phobic @.> Sent: Tuesday, December 6, 2022 4:40 AM To: SPECFEM/specfem2d @.> Cc: Subscribed @.***> Subject: [SPECFEM/specfem2d] Failed job while using mpi parallel (Issue #1170)

When changing the "NUMBER_OF_SIMULTANEOUS_RUNS = 3" (in the Par_file) and running using mpi with 3 processes, the job was failing with an error message: "must not have IMAIN == ISTANDARD_OUTPUT when NUMBER_OF_SIMULTANEOUS_RUNS > 1 otherwise output to screen is mingled. Change this in specfem/setup/constant.h.in and recompile. Error detected, aborting MPI... proc 0"

This was performed using an HPC with 3 nodes. I want to generate the results for 39 source positions simultaneously. Kindly help me figure this out.

— Reply to this email directly, view it on GitHubhttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSPECFEM%2Fspecfem2d%2Fissues%2F1170&data=05%7C01%7Chomnath.gharti%40queensu.ca%7C32a76ed1563c4d163a4308dad76dde86%7Cd61ecb3b38b142d582c4efb2838b925c%7C1%7C0%7C638059164123085611%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=uRF%2FF5Q2DL0%2Fhom9uHFS%2BYSh9I4NL7X%2BlG6uJHFk1mY%3D&reserved=0, or unsubscribehttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABMCQ4XLN34NOYNNSL3BY3LWL4CXVANCNFSM6AAAAAASVJAUUM&data=05%7C01%7Chomnath.gharti%40queensu.ca%7C32a76ed1563c4d163a4308dad76dde86%7Cd61ecb3b38b142d582c4efb2838b925c%7C1%7C0%7C638059164123085611%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=z0suJmYX9aXSyuTWiG2REzJcEQeCEbIdpyt6s5Qvgek%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.***>

rashi-13 commented 1 year ago

Hello Sir

Thank you very much for responding quickly. I tried running the solver after making the necessary changes suggested by you. I got the following error : "configure: error: MPI header not found; try setting MPI_INC."

Also, i want to understand what changes do i need to make and in which file in order to run the same model for different source locations (parallely as in using mpi).

It would be great if you could help me out.

Thankyou very much.

homnath commented 1 year ago

Make sure that the path to the MPI include is included in LD_INCLUDE_PATH, or you can directly set MPI include path in the configure command using MPI_INC variable: For example,

./configure CC=icc CXX=icpc FC=ifort MPIFC=mpiifort MPI_INC=/scinet/intel/psxe/2020u4/compilers_and_libraries_2020.4.304/linux/mpi/intel64/include

Your compilers and path may be different.

Best, Hom Nath


From: name_phobic @.> Sent: Tuesday, December 6, 2022 12:00 PM To: SPECFEM/specfem2d @.> Cc: Hom Nath Gharti @.>; Comment @.> Subject: Re: [SPECFEM/specfem2d] Failed job while using mpi parallel (Issue #1170)

Hello Sir

Thank you very much for responding quickly. I tried running the solver after making the necessary changes suggested by you. I got the following error : "configure: error: MPI header not found; try setting MPI_INC."

Also, i want to understand what changes do i need to make and in which file in order to run the same model for different source locations (parallely as in using mpi).

It would be great if you could help me out.

Thankyou very much.

— Reply to this email directly, view it on GitHubhttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSPECFEM%2Fspecfem2d%2Fissues%2F1170%23issuecomment-1339682108&data=05%7C01%7Chomnath.gharti%40queensu.ca%7C8f63eae18cbf4750ef8008dad7ab6711%7Cd61ecb3b38b142d582c4efb2838b925c%7C1%7C0%7C638059428407784673%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=hgsFqGQ1COe6AEXJ6bnLvt8J6ekMS8ZZ3LzAQMQ4868%3D&reserved=0, or unsubscribehttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABMCQ4X2YYLFKO26VFB76TTWL5WLPANCNFSM6AAAAAASVJAUUM&data=05%7C01%7Chomnath.gharti%40queensu.ca%7C8f63eae18cbf4750ef8008dad7ab6711%7Cd61ecb3b38b142d582c4efb2838b925c%7C1%7C0%7C638059428407940847%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=aX5Po9whgnOTEg5xu0gt%2BIm7x6kURt9XfZ9OmjqOHwM%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

rashi-13 commented 1 year ago

Hello Sir

I have updated the path now. After setting the variable "NUMBER_OF_SIMULTANEOUS_RUNS = 2" and "NPROC =4" since NPROC is required to be a multiple of NUMBER_OF_SIMULTANEOUS_RUNS in the "Par_file"

I am getting the following error"

Error: the number of MPI processes 1 is not a multiple of NUMBER_OF_SIMULTANEOUS_RUNS = 2 the number of MPI processes is not a multiple of NUMBER_OF_SIMULTANEOUS_RUNS. Make sure you call meshfem2D with mpirun. Error detected, aborting MPI... proc 0

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 30.


I will be really grateful if you could help me figure this out.

Thank you very much.

homnath commented 1 year ago

You should run specfem2d using the total cores = NUMBER_OF_SIMULTANEOUS_RUNS*NPROC: mpiexec -n 8 ./bin/xspecfem2D

You should also create two directories: run0001 and run0002, where you should put your input files.


From: name_phobic @.> Sent: Wednesday, December 7, 2022 9:25 AM To: SPECFEM/specfem2d @.> Cc: Hom Nath Gharti @.>; Comment @.> Subject: Re: [SPECFEM/specfem2d] Failed job while using mpi parallel (Issue #1170)

Hello Sir

I have updated the path now. After setting the variable "NUMBER_OF_SIMULTANEOUS_RUNS = 2" and "NPROC =4" since NPROC is required to be a multiple of NUMBER_OF_SIMULTANEOUS_RUNS in the "Par_file"

I am getting the following error"

Error: the number of MPI processes 1 is not a multiple of NUMBER_OF_SIMULTANEOUS_RUNS = 2 the number of MPI processes is not a multiple of NUMBER_OF_SIMULTANEOUS_RUNS. Make sure you call meshfem2D with mpirun. Error detected, aborting MPI... proc 0

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD with errorcode 30.


I will be really grateful if you could help me figure this out.

Thank you very much.

— Reply to this email directly, view it on GitHubhttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FSPECFEM%2Fspecfem2d%2Fissues%2F1170%23issuecomment-1341043982&data=05%7C01%7Chomnath.gharti%40queensu.ca%7Cfea51032f7234298227e08dad85ee3a0%7Cd61ecb3b38b142d582c4efb2838b925c%7C1%7C0%7C638060199298058885%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6A6IQZJbJdrSaC7oCgsFGga1ZMj0VieI9%2FMwI1PY24E%3D&reserved=0, or unsubscribehttps://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FABMCQ4SWU5I6RDM7AS6HP4DWMCM5PANCNFSM6AAAAAASVJAUUM&data=05%7C01%7Chomnath.gharti%40queensu.ca%7Cfea51032f7234298227e08dad85ee3a0%7Cd61ecb3b38b142d582c4efb2838b925c%7C1%7C0%7C638060199298058885%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MRr8DJOfuvZfWuPyfQ1aUsnKck0tm2Hn%2BQXltbwlJD4%3D&reserved=0. You are receiving this because you commented.Message ID: @.***>

AbolfazlKhanMo commented 11 months ago

@homnath ,

I followed what you have said so far. I am running two simulations (_NUMBER_OF_SIMULTANEOUSRUNS = 2 & NPROC =4). I have also created run0001 and run0002 directories, both containing DATA and OUTPUT_FILES directories.

Then this is what I have got when I ran mpirun -n 8 ./bin/xmeshfem2D :

NUMBER_OF_SIMULTANEOUS_RUNS not compatible yet with SAVE_MODEL. Look for SMNSR in the source code.

I looked at the _/specfem2d/src/specfem2D/save_modelfiles.f90 code (is this the correct file to look at?). On line 125, it says: ! SMNSR For compatibility with NUMBER_OF_SIMULTANEOUS_RUNS we have to change the lines trim(IN_DATA_FILES)//'proc'

There are a bunch of lines with the mentioned statement in the comment above, but could you please give me some hint/info how to proceed and potentially run my simulations simultaneously?

Thanks very much, Khan