Closed sunt05 closed 5 years ago
@zhenkunl once you get some results for Shanghai, we can close this issue.
Great!
@zhenkunl
I assumed you have used WPS. So the first step is to get standard wrfinputs
for Shanghai using WPS. After you do that, then you should modify the inputs for the coupled version. I will write you a complete tutorial on this later one. But Let's start with having standard wrfinputs
first.
In addition, here some tips for running WRF-WPS in Jasmin:
When configuring WPS, and WRF in Jasmin, we need to use Intel compilers. For this purpose, before starting to configure or compile WPS or WRF, put it in .bashrc
file and source it.
module load intel/15.1
module load intel/mpi/5.1.2.150
export NETCDF=/apps/libs/netCDF/intel15/fortran/4.4.1
export WRFIO_NCD_NO_LARGE_FILE_SUPPORT=1
export J='-j 6'
export NETCDF_classic=1
export WRF_EM_CORE=1
For Runs, usejasmin-sci3.ceda.ac.uk
Β otherwise you would get to memory problems.
WRF4 has a new method for the number of processors. The total number grids assign in to each processor in x or y direction should not be less than 10. You might get to this problem, but it is easy to fix.
Here a simple bash script for running jobs in Jasmin:
#!/bin/bash
#BSUB -q par-multi
#BSUB -n 49
#BSUB -o %J.out
#BSUB -e %J.err
#BSUB -W 24:00
echo "Running WRF"
# (./real.exe for generating wrfinputs )
mpirun ./wrf.exe
Thanks for your detailed explanation @hamidrezaomidvar. I will try to get with Jasmin first. I will ask for your help when I experience difficulties.
Hi @hamidrezaomidvar. What is the difference of wrf.exe under hamid/xx-test-xx-2 or xx-test-xx-3 or xx-test-xx-4? Which one is the newest?
These are some of the local test I am doing right now. Try xx-test-xx-2 if like to run a case. Others are the test that I have not merge to the master! also please clone master of WRF-SUEWS since the test-dev is still have some problems that I am fixing now.
BTW, I'd like to comment on the "best practise" for organising our WRF runs as I can see more regions will be tested and applied with our coupled system.
wrf.exe
and other related static data files (e.g., those profile-like data generated by WRF itself for a specific version) from your cases with input and output files and ; so all binaries stay in one place;Then, ideally, we would have a structure like this:
βββ WRF-exe
βΒ Β βββ wrf.exe.orig-4.0
βΒ Β βββ wrf.exe.orig-4.1
βΒ Β βββ wrf.exe.suews-4.0
βΒ Β βββ wrf.exe.suews-4.1
βββ cases
βΒ Β βββ London-GMD-paper
βΒ Β βββ London-test-201504
βββ wrf-data
βΒ Β βββ CAM_ABS_DATA
βΒ Β βββ CAM_AEROPT_DATA
βΒ Β βββ CAMtr_volume_mixing_ratio.A1B
βΒ Β βββ CAMtr_volume_mixing_ratio.A2
βΒ Β βββ CAMtr_volume_mixing_ratio.RCP4.5
βΒ Β βββ ...many other files...
βΒ Β βββ tr49t85
βΒ Β βββ tr67t85
βΒ Β βββ wind-turbine-1.tbl
βββ wrfbdy
βΒ Β βββ London
βΒ Β βΒ Β βββ 201504
βΒ Β βΒ Β βββ 201507
βΒ Β βββ Shanghai
βΒ Β βββ 201509
βββ wrfinput
βββ London
βΒ Β βββ MODIS
βΒ Β βββ MODIS-SUEWS
βΒ Β βββ MODIS-updated
βββ Shanghai
βββ MODIS
βββ MODIS-updated
By adopting such a structure, we can set up different runs under the cases
folder and link configurations and binaries from other places; also, as we are linking files, we know what original information is and how we can proceed from there.
In the above structure, the wrfinput
part might need to be changed according to different initial conditions for specific cases, but I put it separately for the geographic data, which usually needs quite amount of work to set up but won't change across runs of a specific region. So instead of link, under certain scenarios, we'd better copy them to the cases
folder.
@zhenkunl @sunt05
Here is a brief instruction on Preprocessing scripts:
wrfinput
files, you should follow the following to modify them and use them for the runs:Under WRF-SUEWS/wrfinput-processor/
: there are 4 main folders with different functionalities:
---> /input-checker
This folder contains a script that check if the SUEWS parameters are being inputted to WRF are in current range and logic. It is still going on, and not completed but you do not need this step to modify the inputs.
---> /param_extractor_SuPy
: this folder contains scripts that (first) runs SUEWS offline using 2012 London or Swindon parameters to spin up the model and (second) extracts all the parameters needed for SUEWS to be inputted in the WRF. Finally it puts them in two files SUEWS_param_new.json
and namelist.suews.new
. The first one (SUEWS_param_new.json
) contains parameters that are in grid-level and needed to be put directly inside wrfinputs
(change_to_SUEWS
folder that I will explain next). You need to copy this file under WRF-SUEWS/wrfinput-processor/
and make sure the script in change_to_SUEWS
has the right name for it. The other file (namelist.suews.new
) contains the run-level parameters of SUEWS, and you need to put this file in the WRF-SUEWS run folder (change its name to namelist.suews
). Note that you also need a namelist.suews
under WRF-SUEWS/wrfinput-processor/
to run the script of this folder because it uses its structure to generate the new namelist file.
---> /change_to_SUEWS
: the script in this folder modifies original wrfinputs
and adds SUEWS related parameters to them. As I mentioned, it uses SUEWS_param.json
under WRF-SUEWS/wrfinput-processor/
. After running the script, you should copy new wrfinputs
in the WRF-SUEWS run folder.
---> /London-Land-Cover-Modify
: the script in this folder is just for the London run, and uses a high resolution land use fraction data to modify the third domain (London focused domain). If you are using the original land use data generated by WPS for Shanghai, you can ignore this folder; otherwise you can use it to modify your inputs for Shanghai.
Please let me know if you get to any difficulties running any of the scripts.
I felt puzzled at the relationship among them this afternoon. It's very thoughtful for you to inform me of these promptly.
Some errors occurred when I submitted real.exe to Jasmin using bsub < bsub_run_real
. The error log showed as the following:
Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(784).................: MPID_Init(1326).......................: channel initialization failed MPIDI_CH3_Init(141)...................: dapl_rc_setup_all_connections_20(1396): generic failure with errno = 671107855 MPID_nem_dapl_get_from_bc(1309).......: Missing port or invalid host/port description in business card Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(784).................:
Any hints?
This is the MPI problem of the Jasmin. It happens all the time for me. You should keep submitting it until it doesnβt give you this error. It happens at the very beginning when rsl files are not generated . If you reduce number of processors, it might help
On May 18, 2019, at 9:34 AM, Li Zhenkun notifications@github.com wrote:
Some errors occurred when I submitted real.exe to Jasmin using bsub < bsub_run_real. The error log showed as the following:
atal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(784).................: MPID_Init(1326).......................: channel initialization failed MPIDI_CH3_Init(141)...................: dapl_rc_setup_all_connections_20(1396): generic failure with errno = 671107855 MPID_nem_dapl_get_from_bc(1309).......: Missing port or invalid host/port description in business card Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(784).................:
Any hints?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I used less cores last night, but still failed. Maybe I should keep trying as you said.
Another problem when run wrf.exe:
INITIALIZE SUEWS NAMELIST -------------- FATAL CALLED --------------- FATAL CALLED FROM FILE:
LINE: 1270 ERROR reading sector coeff of namelist.suews
I used the WRF version under xx-test-xx-2, and namelist.suews was generated for Shanghai. I am wondering if the WRF version is too old to read in the namelist.suews correctly or namelist.suews has changed since WRF was compiled.
This is the same problem I started getting since Thursday which I am suspecting it is also the problem of MPI since I used to run this without any problem. And now sometimes it works sometimes no! Try to chmod 444 namiste.suews and keep running it and let me know what it says
On May 18, 2019, at 10:59 AM, Li Zhenkun notifications@github.com wrote:
Another problem where run wrf.exe:
INITIALIZE SUEWS NAMELIST -------------- FATAL CALLED --------------- FATAL CALLED FROM FILE: LINE: 1270 ERROR reading sector coeff of namelist.suews
I used the WRF version under xx-test-xx-2, and namelist.suews was generated for Shanghai. I am wondering if the WRF version is too old to read in the name.suews correctly or namelist.suews has changed since WRF was compiled.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I tried many times. Sometimes jobs can be submitted successfully, but exit soon. The error files always say "ERROR reading sector coeff of namelist.suews". I suppose there might be something wrong with the code itself or wrf.exe(in xx-test-xx-2) is not consistent with the one in the repo.
can you also change the number of processors (try different ones) and see if it works?
On May 18, 2019, at 7:16 PM, Li Zhenkun notifications@github.com wrote:
I tried many times. Sometimes jobs can be submitted successfully, but exit soon. The error files always say "ERROR reading sector coeff of namelist.suews". I suppose it might be something wrong with the code itself or wrf.exe(in xx-test-xx-2) is not consistent with the one in the repo.
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I did attempt to change the number of processors, unfortunately the jobs cannot be submitted no matter what the number is. Even no .err or .out files are outputted. Can you have a try to see if it is a problem of Jasmin now?
I am running 4 jobs right now. They are in the shared folder starting with Apr, Jul, Oct, and Jan. Try to copy any of them and run them to see if it works. It took me a while to have a successful run of them because of the recent problems.
On May 18, 2019, at 9:22 PM, Li Zhenkun notifications@github.com wrote:
I did attempt to change the number of processors, unfortunately the jobs cannot be submitted no matter what the number is. No .err or .out files are outputted. Can you have a try to see if it is a problem of Jasmin now?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
I copied the Apr-London-Swindon folder to my own directory and changed the forcing data and namelist files(include namelist.input and namelist.suews). All the others remain the same. Then I submitted the job, however, it looked like I didn't do anything. No jobs can be found when execute jobs command, no logs are generated. It's really tricky!
We need to find a solution for this instability. Let's work on it together on Monday and try to solve it.
My wrf.exe run for a while, wrote the wrfout_d01 file for the outmost domain and then exited. I find one of the rsl.error. file ends with
-------------- FATAL CALLED --------------- FATAL CALLED FROM FILE:
LINE: 29365 fatal error in SUEWS:Problem with (z-zd) and/or z0. application called MPI_Abort(MPI_COMM_WORLD, 1) - process 32
and one ends with
-------------- FATAL CALLED --------------- FATAL CALLED FROM FILE:
LINE: 29365 fatal error in SUEWS:Inappropriate value calculated. application called MPI_Abort(MPI_COMM_WORLD, 1) - process 33
Is there anything I might have done wrong?
Might be something wrong with building/canopy height set in your wrfinput.
Sent from my iPhone
On 20 May 2019, at 17:24, Li Zhenkun notifications@github.com wrote:
My wrf.exe run for a while, wrote the wrfout_d01 file for the outmost domain and then exited. I find one of the rsl.error. file ends with
-------------- FATAL CALLED --------------- FATAL CALLED FROM FILE: LINE: 29365 fatal error in SUEWS:Problem with (z-zd) and/or z0.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 32
and one ends with
-------------- FATAL CALLED --------------- FATAL CALLED FROM FILE: LINE: 29365 fatal error in SUEWS:Inappropriate value calculated.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 33
Is there anything I might have done wrong?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Two questions: 1) how many time steps does it run? 2) what is the building height? 3) is the number of vertical grids 33? If it is the case, the first grid point is at around 50 m so if your building heights are more than this, it rises an error.
On May 20, 2019, at 5:24 PM, Li Zhenkun notifications@github.com wrote:
My wrf.exe run for a while, wrote the wrfout_d01 file for the outmost domain and then exited. I find one of the rsl.error. file ends with
-------------- FATAL CALLED --------------- FATAL CALLED FROM FILE: LINE: 29365 fatal error in SUEWS:Problem with (z-zd) and/or z0.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 32
and one ends with
-------------- FATAL CALLED --------------- FATAL CALLED FROM FILE: LINE: 29365 fatal error in SUEWS:Inappropriate value calculated.
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 33
Is there anything I might have done wrong?
β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
A: 1)Only one time step was outputted 2)bldgH_SUEWS = 35.9 in SUEWS_param_new.json 3)Yes, it is. So I need to change the bldgH_SUEWS to a lower value and modify wrfinput_d0* again, right?
The bldgH_SUEWS was reset to 25 or 22 successively and still the same errors. Maybe try a little lower value?
heights of trees also matter.
check these variables: EveTreeH_SUEWS
and DecTreeH_SUEWS
.
Both of these two variables in London run are 13.1, and they are 9.1 and 10.9 respectively in my case. What's the direction?
then try to set a higher debug value to see what height the first/lowest atmospheric level is.
Can you see something from the log?
d01 2012-12-01_00:00:00 after SuMin, qn_SUEWS= 76.0966107299998 d01 2012-12-01_00:00:00 after SuMin, qf_SUEWS= 0.000000000000000E+000 d01 2012-12-01_00:00:00 after SuMin, qs_SUEWS= 6.56633480928586 d01 2012-12-01_00:00:00 after SuMin, qh_SUEWS= -129.063718135778 d01 2012-12-01_00:00:00 after SuMin, qe_SUEWS= 198.593994056492 d01 2012-12-01_00:00:00 qn_out = 76.0966107299998 d01 2012-12-01_00:00:00 qf_out = 0.000000000000000E+000 d01 2012-12-01_00:00:00 qs_out = 6.56633480928586 d01 2012-12-01_00:00:00 qh_out = -129.063718135778 d01 2012-12-01_00:00:00 qe_out = 198.593994056492 d01 2012-12-01_00:00:00 First vertical level is 25.4766330718994 d01 2012-12-01_00:00:00 in SuMin, before calculation, OHM_coef: 0.718999981880188 0.718999981880188 0.718999981880188 0.718999981880188 0.194000005722046 0.194000005722046 0.194000005722046 0.194000005722046 -36.5999984741211 -36.5999984741211 -36.5999984741211 -36.5999984741211 d01 2012-12-01_00:00:00 Problem: In stability subroutine, (z-zd) < z0. d01 2012-12-01_00:00:00 ERROR! Program stopped: Problem with (z-zd) and/or z0. d01 2012-12-01_00:00:00 Values: 0.4766 3.6000 d01 2012-12-01_00:00:00 17 d01 2012-12-01_00:00:00 ERROR! SUEWS run stopped. -------------- FATAL CALLED --------------- FATAL CALLED FROM FILE:
LINE: 29365 fatal error in SUEWS:Problem with (z-zd) and/or z0. application called MPI_Abort(MPI_COMM_WORLD, 1) - process 32
First vertical level is 25.4766330718994
This is very close to the surface. Might need to manipulate the eta levels for a higher first level.
Sent from my iPhone
On 20 May 2019, at 22:08, Li Zhenkun notifications@github.com wrote:
First vertical level is 25.4766330718994
Or decrease the number of grids in the vertical directions in namelist.input
On May 20, 2019, at 10:41 PM, Ting Sun notifications@github.com wrote:
First vertical level is 25.4766330718994
This is very close to the surface. Might need to manipulate the eta levels for a higher first level.
Sent from my iPhone
On 20 May 2019, at 22:08, Li Zhenkun notifications@github.com wrote:
First vertical level is 25.4766330718994 β You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
Eta levels have been decreased from 33 to 28. The error now becomes:
d01 2012-12-01_00:00:00 call cumulus_driver d01 2012-12-01_00:00:00 in cu_tiedtke d01 2012-12-01_00:00:00 returning from cumulus_driver d01 2012-12-01_00:00:00 call shallow_cumulus_driver d01 2012-12-01_00:00:00 calling inc/HALO_EM_FDDA_SFC_inline.inc d01 2012-12-01_00:00:00 call fddagd_driver d01 2012-12-01_00:00:00 call calculate_phy_tend d01 2012-12-01_00:00:00 call compute_diff_metrics d01 2012-12-01_00:00:00 calling inc/HALO_EM_TKE_C_inline.inc Fatal error in PMPI_Wait: A process has failed, error stack: PMPI_Wait(198)............: MPI_Wait(request=0x53a8d5c, status=0x7ffcbca84170) failed MPIR_Wait_impl(79)........: dequeue_and_set_error(933): Communication error with rank 29
Is this a MPI problem or not?
I have run the model twice and errors are the same
Looks like so. I think nothing we help with this.
But still I am seeing thez-zd<0
inrsl.out.024
. Check this: grep "First vertical level" rsl.out.00*
and look what is the lowest value. Maybe try to decrease the eta level of the second grid in your namelist.input
lower than 0.90
, and see what happens.
It has been running for two or more time steps for all three domains and still continues. I can have a good sleep! Thank you all.
Good Job! what was the final problem? is it still running?
I modified the eta levels as you said then it succeeded. It is still running now.
@zhenkunl can we close this?
Sure.
@hamidrezaomidvar, I just got @zhenkunl safely landed on Jasmin π Can you give give him some orientation on how to use the compiled WRF-SUEWS and WPS to run his Shanghai case? I know you have something for Shui, which might be useful for @zhenkunl as well.
Many thanks π€