bluemodel / BlueM.QGISInterface

QGIS plugin for creating input files for a BlueM.Sim model from GIS data
GNU General Public License v3.0
3 stars 3 forks source link

Wrong datatype for some columns in BOD file #8

Closed jamaa closed 2 years ago

jamaa commented 2 years ago

The data type of the columns "anzsch", "boa1", "boa2", "boa3", "boa4", "boa5" and "boa6" should be integer instead of float. See also https://wiki.bluemodel.org/index.php/BOD-File.

I am not sure where exactly in the plugin code this needs to be changed, is it this line? https://github.com/bluemodel/BlueM.QGISInterface/blob/3571d50d20a2b7f455b40f61ebe720ed2ea4925c/inputfiles_overview.csv#L785

jamaa commented 2 years ago

Also, the column "Flaeche" in the EFL file should be float instead of integer. This column is also a bit tricky in that rounding errors can cause issues in BlueM.Sim, because the sum of all values for one EZG has to equal 100% (tolerance: 0.001).

MartinGrosshaus commented 2 years ago

You are correct, it is line 785 for the changes in the BOD file data type definitions. For the "Flaeche" in the EFL it is in line 819.

I changed all mentioned definitions in the excel (where it is a lot easier to see) and uploaded it together with a revised CSV.


Regarding the rounding error: Do you think there should be a warning for the user, that the values don't add up to 100% for a given EZG? Or should the plugin change the values on it's own to get to 100% (-> problematic) ?

jamaa commented 2 years ago

Thanks for the quick fix! What about the automatically generated geopackage, do datatypes need to be changed in there as well? Or does that use the definitions in the csv as well?

Regarding rounding, I agree that automatically changing values can be problematic. But if the original values in the feature class (before rounding) add up to 100%, the rounded values written to the EFL-file should be automatically adjusted, if necessary. Would that be possible without too much effort? Or am I just imagining that this could ever be a problem, given that there is a tolerance of 0.001?

MartinGrosshaus commented 2 years ago

You're welcome :-) The layer columns in the geopackage are created by the "append_layer_generic"-function (see line 2688), which also appends existing layers if ordered to. This function is completely based on the definitions in the csv.


If a tolerace of 0.001 means a tolerance of 0.1 %, i.e. a sum of 99.9% would be acceptable, there should not be an issue.

The "Flaeche" field of the EFL is encoded as a float with a maximum length of 6 characters. If we remove the integers and the point it leaves space for at least 3 decimal points (4 for values smaller that 10, except "100.00" but that doesn't matter here).

Therefore the deviation / rounding error can't be more than +-0.0005 % per element of a single EZG. That means to exceed the tolerance a single EZG would need 200 parts, all rounded in the same "direction" (+ or -) with their maximum deviation - which is highly unlikely.

In conclusion: if the input data adds up, the tolerances won't be exceeded by rounding the values to fit in the EFL format.

But it's friday evening, so my math could be wrong - please correct me if necessary :-)

jamaa commented 2 years ago

Your math sounds very convincing for a friday evening! :-)

jamaa commented 2 years ago

Actually the tolerance is only 0.001 % We could change it though I think.

jamaa commented 2 years ago

Forgot to say great that the geopackage uses the same definitions!

Actually I don't think I quite understand your math. Here's my attempt:

Given that we have to round all entries to 2 decimal places in order to be able to also fit "100.00" in a space of 6 characters, the largest rounding error we can get from a single entry is 0.005. Say we have only two entries, and both incur this error, the total deviation can already be 0.01.

Original Rounded Error
80.055 80.06 0.005
19.945 19.95 0.005
SUM 100.000 100.010
DEVIATION 0.0000 0.0100

If you add more precision by rounding numbers smaller than 100 to 3 decimal places, the possible error per entry is only 0.0005 as you said, but to reach the total allowed deviation of 0.001 you still only need two such entries, right? To get an error larger than that, I have found an example where 4 entries are sufficient:

Original Rounded Error
50.0555 50.056 0.0005
30.1165 30.117 0.0005
5.0005 5.001 0.0005
14.8275 14.828 0.0005
SUM 100.0000 100.002
DEVIATION 0.0000 0.002

Perhaps this discussion is more academic (and a little fun brain training for me) than practically relevant, though. :-)

jamaa commented 2 years ago

Anyway, we got sidetracked a little. I saw in the code that floats are rounded to maximum possible precision depending on the individual value. Adjusting values after rounding depending on the sum of a list of values would probably be quite complicated to implement. I have opened a new issue #9 which we can work on in case this is ever a real problem. The original issue has been fixed, I will create a new release with this fix.