CSIS validation - Githubissues

NICDDB commented 4 years ago

Dear all, we have done a first comparison between Plinivs simplified model and CSIS results. Plinivs model works on more detailed land use data we developed for the expert level analysis, and uses as aggregation for the final results the grid we have defined and use for risk assessment and impact simulation, here in Campania Region, also for seismic and volcanic risk, which is a square grid of cells 250x250 m in CRS EPSG: 23033 (UTM33N) that isn't directly overlap to the EU grid LAEA ETRS89 500m, so an intermediate step of harmonization is needed for comparison.

We observe for Tmrt results of the CSIS model lower values, in each of the simulation we compared, in the graph in the following figure you can see an average constant offset (the graph represent Tmrt values on the y-axis, and the cell number on the x-axis, order by increasing CSIS Tmrt).

The differences other than the detail scale of the dataset used, can be probably due to a different estimation of the surface temperature, or a different solar radiation used by two model, or a combination of the two.

So we have some question:

What are the solar radiation values used by EMIKAT, and how are they obtained?
How EMIKAT calculate the surface temperature parameter?

Calculation of soil temperature for CSIS must be done this way: *Ts = Tairincr.**

CSIS LAYER	incr.
agricoltural_areas	1,3239557
built_open_spaces	1,90332278
dense_urban_fabric	1,88683544
low_urban_fabric	1,88683544
medium_urban_fabric	1,88683544
public_military_industrial	1,88683544
railways	1,41835443
roads	1,93822785
sport	1,88683544
trees	1,4221519
vegetation	1,4221519
water	0,99683544

The results of comparison are:

SCENARIO A CSIS input: 2011-2040, rcp 8.5, frequent, Tair 34°. PLINIVS input: gg 16.07.2015, Tair 34°, H 14:00 Tmrt csis mean	Tmrt plinivs mean	Delta (CSIS – PLINIVS)
50,62	58,19	-7,57

SCENARIO B CSIS input: 2011-2040, rcp 4.5, occasional, Tair 39°. PLINIVS input: gg 16.07.20.., Tair 39°, H 14:00

Tmrt csis mean	Tmrt plinivs mean	Delta (CSIS – PLINIVS)
57,63	65,82	-8,20

The process used to do the comparison was:

downlod the CSIS results
join the CSIS data with grid clarity:laea_etrs_500m_grid
harmonize and joined the PLINIVS data with grid clarity: laea_etrs_500m_grid
compare the result.

For joining the PLINIVS data with clarity:laea_etrs_500m_grid we have be done a zonal statistic intermediate step with the following result:

SCENARIO A CSIS input: 2011-2040, rcp 8.5, frequent, Tair 34°. PLINIVS input: gg 16.07.2015, Tair 34°, H 14:00

PLINIVS simplified model results on plinivs_grid_250x250m_EPSG:23033

PLINIVS simplified model results on EU grid LAEA 500m_EPSG:23033

CSIS results on EU grid LAEA 500m_EPSG:23033

SCENARIO B CSIS input: 2011-2040, rcp 4.5, occasional, Tair 39°. PLINIVS input: gg 16.07.20.., Tair 39°, H 14:00

PLINIVS simplified model results on plinivs_grid_250x250m_EPSG:23033

PLINIVS simplified model results on EU grid LAEA 500m_EPSG:23033

CSIS results on EU grid LAEA 500m_EPSG:23033

NICDDB commented 4 years ago

@DenoBeno please, can you add to this issue also Robert Goler? I don't know his Github name!

RobAndGo commented 4 years ago

As the CSIS results are consistently too low, is the error as simple as a multiplication or addition factor missing in the formula that CSIS uses?

mattia-leone commented 4 years ago

as said, we think that the error is related to the different values currently used by emikat with respect to the parameter Ts/Ta, which we have calibrated only recently. another factor is the value used for the global radiation in emikat, which we don't know, and of course we should input the same value in the plinivs model to perform the validation. we are pretty confident that aligning these two values will result in much more similar results

humerh commented 4 years ago

I also think, that this are the main reason:

My simplified methode for calculation of radiation is:

theta = 180°C (June 21st, 12:00) eta = 90-latitude+23,45 G = 1120W sin(eta/180PI) D = 0,23 G I = 0.77 G

Surface temperature Ts = Ta * (1+ SURFACE_FACTOR)

The SURFACE_FACTOR = 0.269 (for Water, Trees, Vegetaion and Agriculture) all other layers: 0.7552

stefanon commented 4 years ago

Dear @humerh we can make a check for results with our model using the same simplified calculation for radiation, but about Ts you should use a table of values per landuse class like our estimated values obtained with other models comparison. If, as I understand, the calculation is done always for the same time of day, I can post a small table for Ts = f(Ta) to be feed to your procedure, and then can compare the results again. The use of a table of values is important also for adaptation measures evaluation.

DenoBeno commented 4 years ago

If I understand it correctly, Heinrich is calculating

T_s= T_a* 1.269 - for all the greenish areas.

And it should be T_s= T_a 1,3239557 - agricultural T_s= T_a 1,4221519 - trees

...and so on. The table of the values is above, it just needs to be entered.

DenoBeno commented 4 years ago

OK, Heinrich has entered all the factors as in the table above. This should result in 5 to 10% higher values depending on the surface type now. Please try.

Hmm... The difference is 14% now, so this will probably account for ca. 1/2 of the difference.

What I'm wondering now is if we are talking about the same areas? E.g., in EMIKAT the "X urban fabric", "public military" and "sports" layers only contain buildings, not the whole areas where such buildings are. Compared to urban atlas, this means that we have more green areas => the increment for the "buildings" needs to be higher than if we would be using the urban atlas data.

Is this the case in Naples model too?

The interesting thing would be also to compare the EMIKAT calculation with results in Stockholm and/or Linz.

DenoBeno commented 4 years ago

I am currently re-calculating the data for "AIT demo" in Vienna. I have exported the old csv data for "worst case" and "historic" 1y and 20y events before re-starting the calculation:

NICDDB commented 4 years ago

I apologize very much to all of you, but we have found some small errors in the previous temperature table, so it must be changed with this:

CSIS LAYER	(1 + ts_incr)*incr_calibr
agricoltural_areas	1,309031853
built_open_spaces	1,936701731
dense_urban_fabric	1,930016816
low_urban_fabric	1,930016816
medium_urban_fabric	1,930016816
public_military_industrial	1,930016816
railways	1,403089032
roads	1,988740556
sport	1,377928266
trees	1,067894406
vegetation	1,377928266
water	1,066529032

This is a table obtained for Naples, in summer, for the hours 14 UTC + 2 to be precise, since we compare it with CSIS which calculates at 12. (what time zone? if it is UTC, is ok)

DenoBeno commented 4 years ago

I don't know if this is with old or new table of values (probably old), but the results of the new calculations are IMO strange.

This is what I get for 20y reoccuring event in the past:

See the complete data for yourself...

AIT demo historic.20y comparison.xlsx

Cold areas got up to 20% warmer, warm ones got up to 15% warmer. Some areas are up to 20% colder.

Please note that all three temperature indices changed. Here is T_a:

humerh commented 4 years ago

I have updated the factors for calculation of Surface temperatur.

mattia-leone commented 4 years ago

Hi @humerh , I am really sorry for the confusion this morning. We will work this week to address what discussed today about how to calculate the effect of adaptation strategies. I suggest to talk again next Monday, hopefully we will have some easy solution for you to implement in Emikat. I would also like to discuss better with @RobAndGo how to display flood results in the CSIS by linking the runoff, elevation and stream/basin parameters with the rain input. As said we got a "qualitative proxy" of how urban areas are likely to be flooded in case of an extreme precipitation event, based on the 3 parameters above. When we consider also rain, will it be a simple "multiplication" (the more severe the event is the "redder" will be the cells)? We still advice to connect colors only to qualitative infos (e.g. green=very low, red=very high).. let's think about that

stefanon commented 4 years ago

I wish to add some details to the discussion about the flood calculations in this morning telco, in view of further discussion with @RobAndGo and all involved. As told by @mattia-leone we produced a qualitative proxy variable that gives information on the propensity to flooding in the study area, to be used at the screening level. In this way only a few data are needed to define the parameters on which the PF propensity variable calculation is based, while a quantitative approach needs to be implemented with a procedure for flow accumulation analysis on the DEM for basins that are huge in several cases, as I’ve found in the layers on Clarity server (areas in light green in the following figure):

In the land-use layers elaboration procedure the following attributes are evaluated and stored for each grid cell:

• streams – the number of streams that are found in a cell (streams are an input available for all Europe) • minimum – this cell attribute specify the difference in altitude between the cell and the lowest point of the basin of belonging • runoff_avg – the value of the runoff coefficients in the cell weighted on the area of the cell

These parameters have a range of values that are widely spread in different cities, so I wrote a procedure that for each city produces a classification from 0 to 5 of these so that the calculated results can be shown in a map with a scale unique for all the cities:

• cl_streams - this is the cell attribute 'streams' classified on values 0 - 5 • cl_delta_elev - this is the attribute 'minimum' classified on values 0 - 5 • cl_runoff - this is the attribute 'runoff_avg' classified on values 0 - 5

cl_delta_elev

In the previous image, you can see a sample of the result of the classification for the minimum attribute to the cl_delta_elev classes for Linz city, starting from values of elevation per cell that are in the range 235 - 1378m. Please note that for the elevation the classified values are in reverse order respect the other parameters, as we have higher class values (5) for cells that are near the lower point of the basin.

The indicative proxy variable for flooding propensity is calculated as:

PF_prop = _f(cl_streams, cl_delta_elev, clrunoff) = (cl_streams cl_delta_elev + cl_delta_elev cl_delta_elev + cl_runoff * cl_delta_elev)

The result is:

pf_prop

the values for PF_prop are in the range [0;75] units and are presented in the following way:

The code used for the elaboration is in SQL: I've done a local copy in a Spatialite DB of the Clarity land-use layer for speed of elaboration and testing, and I'll post it hereafter. It can be used as-is or modified to be included in Emikat or used for making a view on Geoserver on the _clarity:landusegrid layer already produced by @negroscuro scripts.

stefanon commented 4 years ago

Here is the query that evaluates the attributes classes and the PF_prop value for each cell at the same time. The downloaded land-use data were stored in a table named "landuse_grid_full_200425" and that needs to be adapted:

select qq.id, qq.city, qq.streams, qq.cl_streams, qq.minimum, qq.cl_delta_elev, qq.run_off_average, qq.cl_runoff,  
  (cl_streams*cl_delta_elev + cl_delta_elev*cl_delta_elev + cl_runoff*cl_delta_elev) pf_prop,
  geom
  from
  (
  select id, lgf.city, streams, 
        CASE 
            WHEN  "streams" <= 0 THEN 0
            WHEN  "streams" > 0 and "streams" <= 1 THEN 1
            WHEN  "streams" > 1 and "streams" <= 2 THEN 2
            WHEN  "streams" > 2 and "streams" <= 3 THEN 3
            WHEN  "streams" > 3 and "streams" <= 4 THEN 4
            WHEN  "streams" > 4 THEN 5
        END cl_streams,  
        minimum, 
        CASE 
            WHEN  "minimum" <= elev.c_vhigh THEN 5
            WHEN  "minimum" > elev.c_vhigh and "minimum" <= elev.c_high THEN 4
            WHEN  "minimum" > elev.c_high and "minimum" <= elev.c_med THEN 3
            WHEN  "minimum" > elev.c_med and "minimum" <= elev.c_low THEN 2
            WHEN  "minimum" > elev.c_low and "minimum" <= elev.c_vlow THEN 1
            WHEN  "minimum" > elev.c_vlow THEN 0
        END   cl_delta_elev, 
        run_off_average, 
        CASE 
        WHEN  "run_off_average" <= roff.c_vlow THEN 0
        WHEN  "run_off_average" > roff.c_vlow and "run_off_average" <= roff.c_low THEN 1
        WHEN  "run_off_average" > roff.c_low and "run_off_average" <= roff.c_med THEN 2
        WHEN  "run_off_average" > roff.c_med and "run_off_average" <= roff.c_high THEN 3
        WHEN  "run_off_average" > roff.c_high and "run_off_average" <= roff.c_vhigh THEN 4
        WHEN  "run_off_average" > roff.c_vhigh THEN 5
        END as cl_runoff, 
        geom    
   from landuse_grid_full_200425 lgf, 
        (
        select q.*, min_delta + class_width c_vhigh,  min_delta + 3*class_width c_high, min_delta + 8*class_width c_med, 
             min_delta + 16*class_width c_low, min_delta + 50*class_width c_vlow
        from 
            ( 
                select city, max(minimum) max_delta, min(minimum) min_delta,  max(minimum) - min(minimum)  diff_delta, 
                (max(minimum) - min(minimum)) / 50 class_width
                from landuse_grid_full_200425
                group by city
            )  q
        ) elev,
        (
            select q.*, min_roff + class_width c_vlow,  min_roff + 3*class_width c_low, min_roff + 8*class_width c_med, 
            min_roff + 16*class_width c_high, min_roff + 50*class_width c_vhigh
            from 
            ( 
                select city, max(run_off_average) max_roff, min(run_off_average) min_roff,  max(run_off_average) - min(run_off_average)  diff_roff, 
                (max(run_off_average) - min(run_off_average)) / 50 class_width
                from landuse_grid_full_200425
                group by city
            ) q
        ) roff
  where lgf.city is not null and lgf.minimum is not null and lgf.run_off_average is not null and lgf.city = elev.city and lgf.city = roff.city
  ) qq

DenoBeno commented 4 years ago

Heinrich has updated all the parameters for heat model. Can you please re-validate the data for Naples ASAP now? If all is OK, I want to ask Lena to validate in Stockholm too.

RobAndGo commented 4 years ago

Concerning the flooding, I have a question about the equation:

PF_prop = f(cl_streams, cl_delta_elev, cl_runoff) = (cl_streams cl_delta_elev + cl_delta_elev cl_delta_elev + cl_runoff * cl_delta_elev)

In the deliverable D3.3, there is the equation: Total = [ ( p1 × 0.25 ) + ( p2 × 0.25 ) + ( p3 × 0.25 ) + ( p4 × 0.25 ) ] where p1 = runoff coefficient (cl_runoff) p2 = relative elevation (cl_delta_elev) p3 = flow accumulation streams (cl_streams) p4 = emergency calls (not needed in above example)

Are these equations similar or do they represent completely different things?

stefanon commented 4 years ago

Dear @RobAndGo , they are similar as they want to indicate the same propensity for flooding, but the data available at expert level for Naples are different from what is available for all Europe, mainly the flow accumulation is missing, so a different formulation was elaborated with available data, D3.3 needs to be updated.

claudiahahn commented 4 years ago

In the deliverable the indices (cl_streams, cl_delta_elev, cf_runoff, emergency calls) are multiplied by 0,25 (and if there are only 3 indices, they should be multiplied by 0,33). But here you are multiplying by cl_delta_elev. Was that done to give the cl_delta_elev a higher weight?

stefanon commented 4 years ago

Hi @claudiahahn, sorry for the confusion:

But here you are multiplying by cl_delta_elev. Was that done to give the cl_delta_elev a higher weight?

Exactly: the reason for this was that the formula used for Naples city and described in the deliverable used a layer that is the flow accumulation per cell, then classified to show the streams in the city area, while at EU screening level we have only the number of streams per cell obtained from the stream's path.

In the deliverable the indices (cl_streams, cl_delta_elev, cf_runoff, emergency calls) are multiplied by 0,25 (and if there are only 3 indices, they should be multiplied by 0,33).

In the calculation procedure all the involved parameters are classified in a 0 to 5 range for the indices, that are in the end multiplied and summed giving a range of possible values from 0 to 75. These latter aren't scaled again in smaller values, to have more choice in defining the threshold for the qualitative scale "very low, low, medium, high, very high" of values.

claudiahahn commented 4 years ago

Thank you @stefanon

RobAndGo commented 4 years ago

I have generated the daily precipitation values for the three precipitation events that we (@claudiahahn @mattia-leone @stefanon @NICDDB) talked about last week, which are necessary for the PLINIVS flood local effect model.

As an example of what the data means, the following excel table shows an example for the Naples grid point, with each sheet (10 in total) representing either the baseline climate or the 9 future time periods/RCP combinations. The tables show the number of precipitation events (top table) in the 30-year period where the daily precipitation is greater than or equal to a threshold value (columns). The rows show the number of consecutive days where the threshold precipitation is reached. The lower table shows the probability of occurrence in 1 year.

precipitation-matrix_naples.xlsx

From this data, three precipitation events (similar to the heat wave events) have been defined:

frequent: occurs every year (probability of occurrence in 1 year = 1.0)
occasional: occurs once in 5 years (probability of occurrence in 1 year = 0.2)
rare: occurs once in 20 years (probability of occurrence in 1 year = 0.05)

The daily precipitation which coincides with these three events have been found for all grid points in Europe for the baseline climate and the 9 future time periods/RCP combinations. I have attached the csv-files with extension .txt below.

europe_pr_eobs_1971-2000.csv.txt europe_pr_rcp26_2011-2040.csv.txt europe_pr_rcp26_2041-2070.csv.txt europe_pr_rcp26_2071-2100.csv.txt europe_pr_rcp45_2011-2040.csv.txt europe_pr_rcp45_2041-2070.csv.txt europe_pr_rcp45_2071-2100.csv.txt europe_pr_rcp85_2011-2040.csv.txt europe_pr_rcp85_2041-2070.csv.txt europe_pr_rcp85_2071-2100.csv.txt

As for the heat wave files, the format here is:

latitude;longitude;urban area;country;RCP;Period;RX1day_mm Frequent(1.0);RX1day_mm Occasional(0.2);RX1day_mm Rare(0.05)
41.052959;14.266241;;;historical;19710101-20001231;40;65;85

The daily precipitation values in the last 3 columns are in units of mm and, in this example, show that:

frequent event = daily precipitation of 40mm
occasional event = daily precipitation of 65mm
rare event = daily precipitation of 85mm

The idea is that this data can be incorporated into EMIKAT by @humerh in a similar way as the heat wave event data, and subsequently accessed by the PLINIVS flood local effect model.

p-a-s-c-a-l commented 4 years ago

The flooding discussion is off-topic ;-) So I suggest to continue this in a separate issue.

Heinrich has updated all the parameters for heat model. Can you please re-validate the data for Naples ASAP now? If all is OK, I want to ask Lena to validate in Stockholm too.

Are we now confident with the results of the screening model?

NICDDB commented 4 years ago

Are we now confident with the results of the screening model?

@p-a-s-c-a-l yes, the result of the CSIS screening model is now similar to PLINIVS model.

The mean difference between CSIS results and PLINIVS results is -4 °C (CSIS – PLINIVS).

This can be justified by the different land-use basis of two models, the attribution of shadow (the CSIS model does not consider shadows while the plinivs model does it), and there is also a small difference in diffuse and direct fractions of incoming short wave Solar radiation (EMIKAT use 0.27G for Diffuse and 0.77G for Incoming, while PLINIVS use 0.30G for Diffuse and 0.70G for Incoming, G global radiation).

Anyway we test run the PLINIVS model using the same D and I fractions used by EMIKAT.

The mean difference between CSIS and PLINIVS of this test is -0.35 °C.

This result was compared on one scenario (2011 - 2040 rcp 4.5 occasional), to be sure I will compare other scenarios.

humerh commented 4 years ago

Dear @stefanon ,

concerning the parts of the formulas, like:

      select city, max(minimum) max_delta, min(minimum) min_delta,  max(minimum), min(minimum)  diff_delta, 
          (max(minimum) - min(minimum)) / 50 class_width
          from landuse_grid_full_200425
          group by city

Would it be acceptable to build this min/max aggregation functions for the "STUDY AREA", and not for the CITY? Or is this region for building this min/max defined in another way (like region of river BASIN)?

stefanon commented 4 years ago

Dear @humerh ,

as the flooding discussion is off topic here, I've answered you here: https://github.com/clarity-h2020/emikat/issues/36#issuecomment-642792781

p-a-s-c-a-l commented 4 years ago

resolved

clarity-h2020 / emikat

CSIS validation #35