Closed mee067 closed 6 months ago
It's worth trying with Ubuntu EC2 instances. I've tested recently with Ubuntu and it seems to work just fine.
Thanks, that also came to my mind
same segmentation fault on Amazon cloud ubuntu instance for r1860. Only this time I get some more info on the routine throwing the error:
` RUNCLASS36 is active. BASEFLOW component is ACTIVE.
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
./scripts/perform_capa_hindcast.sh: line 39: 22888 Segmentation fault (core dumped) $mesh_exe`
Also same compilation error for r1745 on ubuntu. I think it is related to the version of gfortran which is almost same on both Amazon Linux (11.4.1) and Ubuntu (11.4.0)
I looked at the compilation error for r1745. I am not sure which variable but I looked at ISAND and it is integer all the way.
I also found that the new makefile (1860) has some additional compiler options compared to 1745 which may have suppressed the conversion issue for 1860. I am not very conversant with compiler options but I think it could be (-Wconversion). But it is there in the makefile of 1745 - so I am not sure what it is the issue and I am not sure which variable gets converted implicitly.
Any feedback?
I recompiled r1860 with symbols on and this is what I got:
`Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
at ./Driver/MESH_Driver/output_variables.f90:1389
at ./Driver/MESH_Driver/output_variables.f90:2078
at ./Driver/MESH_Driver/output_variables.f90:2530
at ./Driver/MESH_Driver/MESH_driver.f90:847
at ./Driver/MESH_Driver/MESH_driver.f90:97
./perform_capa_hindcast.sh: line 39: 32760 Segmentation fault (core dumped) $mesh_exe`
line 97 in MESH_driver is "use mpi_module" which is not active in this compilation and there is a stub for it so I am not sure why it objects.
I traced the rest, and line 1389 in out_variables.f90 is the second line in this block:
if (associated(group%ican)) then
where (group%tacan > 0.0)
group%ican = 1.0
elsewhere
group%ican = 0.0
end where
end if
tacan, qacan, and uvcan are new variables which I added as outputs - I copied their blocks from tcan. Maybe this needs review @dprincz. I think tacan and qacan were already there internally but not as outputs, uvcan wasn't. I used to compile with intel and did not get that issue and they did produce the required output when I tested them.
So, I managed to compile 1860 on the AM 2023 linux after commenting some code blocks related to tacan, qacan and uvcan outputs.
But compiling older code hits that issue related to type mismatch. Comparing CLASSW.f and RUNCLASS_module.f90 across versions does not indicate where is the problem. I need to run older code for some of the setups that I could not fully migrate to 1860.
This might be a known issue I've fixed.
Find and update this block in output_variables.f90 from this:
if (associated(group%ican)) then
where (group%tacan > 0.0)
group%ican = 1.0
elsewhere
group%ican = 0.0
end where
end if
if (associated(group%ican)) then
where (group%qacan > 0.0)
group%ican = 1.0
elsewhere
group%ican = 0.0
end where
end if
if (associated(group%ican)) then
where (group%uvcan > 0.0)
group%ican = 1.0
elsewhere
group%ican = 0.0
end where
end if
To this:
if (associated(group%ican) .and. associated(group%tacan)) then
where (group%tacan > 0.0)
group%ican = 1.0
elsewhere
group%ican = 0.0
end where
end if
if (associated(group%ican) .and. associated(group%qacan)) then
where (group%qacan > 0.0)
group%ican = 1.0
elsewhere
group%ican = 0.0
end where
end if
if (associated(group%ican) .and. associated(group%uvcan)) then
where (group%uvcan > 0.0)
group%ican = 1.0
elsewhere
group%ican = 0.0
end where
end if
Please close the thread if this resolves the issue.
Well, that resolved the issue on AWS. I did a simple test after restoring the canopy level outputs.
Before we close this issue, please answer the following questions:
ICAN
is the number of PFTs which is 4 (i.e. fixed) - why are those canopy level outputs are conditioned on ICAN?
I also forgot the difference between tcan
and tacan
! Vegetation temperature vs air temperature within the canopy as CLASS defines them. They do not sound very different, do they?
Note that all canopy level variables already existed in CLASS, I just exposed them to MESH to get output. Even the MESH variables TACAN and QACAN were there. Only UVCAN wasn't.
Different ican
. If you look in output_variables, you'll see a few i-values which are used for calculating averages when only valid values should be considered (e.g., shortwave radiation, snow, etc..). These are averaging counters local to the routine.
This is by design so the equivalent of NO_DATA
values, when it's not appropriate to consider "0.0" among an average value, are omitted when calculating a representative average value.
I believe tcan
is the temperature of the canopy while tacan
is the ambient temperature within the canopy. I think they should be similar. From my understanding, this is why tacan
as a prognostic state is set to tcan
when passing between time-steps and resuming previous run-states.
so ican = number of canopies within a tile which has a maximum of 4. It can be zero if FCAN(1..4) = 0, so it is either rock (FCAN(5) = 0) or some impervious cover (FCAN(5)>0), right? This protects against the case of having an impervious type only like glaciers, water, or urban tiles.
btw, I know that if sum(FCAN(1..5)) < 1, it will assign the remainder to rock with hard-coded properties. What if the sum > 1, does it scale things down to sum to 1?
Hi Mohamed,
ican
in this context doesn't have anything to do with CLASS's definition for ican
. For questions specific for CLASS's instance of ican
and icp1
, I suggest creating a separate issue tagged for documentation.
Dan
ok, will move the question regarding CLASS ican
and fcan
to another thread. For output purposes, how is ican
assigned? I searched the module and could not figure things.
I compiled r1860_ME and r1860_ME_ZT on Amazon Cloud (AM Linux 2023 - using gfortran 11.4.1) and I got a segmentation fault when trying to run it within the Yukon Forecasting System.
I also got the same segmentation fault issue on an older instance (AM Linux 1 - AL AM 2018.03 which reached its end of life forcing me to create a new instance with the recent Linux version). This instance has gfortran 6.4.1.
I tried to compile older code (r1745) on the new instance but it gives errors:
Any ideas?