climsoft / Climsoft

Climsoft Desktop for Windows - http://www.climsoft.org
GNU General Public License v3.0
14 stars 28 forks source link

Do the climsoft tables recognise invalid dates? #751

Open rdstern opened 2 weeks ago

rdstern commented 2 weeks ago

@mhabimana and @Patowhiz we were sent climatic data by DCCMS. This was daily data for 2 stations, rain, tmax and tmin for many years. One element gave odd errors in R-Instat, for tmin namely that there were 9988 values on February 29th in many non-leap years. One non-leap year (1985) had the value 23.8 on 29 February.

I assumed this could not have been exported from the current climsoft, but I have been assured it did. Am I therefore correct to deduce that the current Climsoft does not detect whether dates are valid?

If this is the case, then I assume it should be a simple correction?

Patowhiz commented 2 weeks ago

@rdstern I'm surprised by that. The database itself (leave alone the application) should not accept invalid dates.

@mhabimana @smachua any explanation as to why this may have been possible? Or should we follow up with DCCMS to have a look at the database directly?

rdstern commented 2 weeks ago

@Patowhiz just to clarify, I am not suggesting this is a problem with data entered into Version 4. If it is a problem, it is inherited from earleir versions of Climsoft. So I wonder whether it was a problem earlier. If so, does the importing into Version 4 detect and clean those problems out?

rdstern commented 2 weeks ago

@Patowhiz in addition, if this remains a legacy in these databases, then how do we clean them out now?

Patowhiz commented 2 weeks ago

@rdstern I understand. My previous comment referred to the two levels at which data validation typically occurs: Application and Database. If the application fails to detect an incorrect datetime, the database should be able to catch it. In Climsoft, the database stores datetimes in a MariaDB column of the DATETIME type. This means any invalid datetime should be rejected by the database server, and an error should be raised.

I think it would be useful to check with DCCMS on how such invalid datetimes are possible in Climsoft. Then we can think of how to clean and prevent such occurrences.

mhabimana commented 2 weeks ago

@rdstern, @Patowhiz : It is possible that such values may have sailed in the database through data files import into Climsoft. Since they are discovered, it is just a matter of follow these up in the database and clean them. Correcting wrong values in Climsoft database should not be a problem. It is straight forward.

Patowhiz commented 2 weeks ago

@mhabimana thanks for your response. Even if it passed through the Climsoft import feature, I still don't understand how that is possible. The database server should throw a "SQL Error (1292): Incorrect datetime value...". The MariaDB server shouldn't be able to store such a datetime because its execution engine would not permit it.

Have you encountered this kind of invalid date before? I've been unable to reproduce this kind of invalid date in my Climsoft database.

smachua commented 2 weeks ago

@rdstern we need to confirm the source of that data. Climsoft V4 stores the quality controlled data in MariaDB with the datetime structure of yyyy-mm-dd HH:MM:SS which does not accept invalid dates according to MariaDB rules leave alone Climsoft. Please send us a copy of that we try to deduce its source.

isedwards commented 2 weeks ago

Before data is moved to observationinitial it is stored in tables like form_hourly where the date is represented by three separate columns (see below). These individual integer fields will permit invalid dates if the Climsoft application does not catch the problem during key entry.

We've seen in the past some met services keep large volumes of data in the form tables and not move it through QC. It's possible they have exported the data from one of these tables.

CREATE TABLE IF NOT EXISTS `form_hourly` (
  `stationId` varchar(50) NOT NULL,
  `elementId` int(11) NOT NULL,
  `yyyy` int(11) NOT NULL,
  `mm` int(11) NOT NULL,
  `dd` int(11) NOT NULL,
  ...
smachua commented 2 weeks ago

@isedwards although key entry tables can store invalid dates I still doubt if the data described by @rdstern was exported from any of them for 2 reasons:

  1. Key entry forms are designed with functionalities to block saving of records with invalid or future dates.
  2. Data in key entry tables are without decimal points. So values like 23.8 may not have been from there.

I still think we need to confirm the source of the data sent to @rdstern so that the problem is well addressed.