SEED-platform / seed

Standard Energy Efficiency Data (SEED) Platform™ is a web-based application that helps organizations easily manage data on the energy performance of large groups of buildings.
Other
107 stars 55 forks source link

DQ Required vs Not Null #1615

Closed RDmitchell closed 4 years ago

RDmitchell commented 6 years ago

The issue is what is the difference between Required and Not Null. For a specific data file, if the DQ checking is done on import, the Required field makes sense, ie, check to make sure that the fields marked "Required" are in the imported file.

For the case where the DQ checking is done on the inventory list, the Required field may not actually make much sense, at least on the level of checking each record. Presumably the field is either there or not for all the records in the Inventory List (?).

Expected Behavior

Actual Behavior

Steps to Reproduce

Instance Information

instance: LBNL production SHA: 36cc59d

nllong commented 6 years ago

Required and Not Null are quite similar. From this chunk of code it looks like if the field is just missing and it is required, then it add the "missing field". If the field does exist and it is either None or empty string, then it will either be "value is missing and none/empty" or "value is null".

I don't know the best way to resolve this without a couple specific cases that we want to test and the result that we want for each of them. Maybe we just get rid of not null?

RDmitchell commented 6 years ago

I think users want to know if a field that they want to have data in doesn't have data in it, so in some ways, it might be better to get rid of the "Required" and just leave the "Not Null".

But before we do anything let me talk to the users who were having difficulty with this and see if I can get some specific test cases from them.

nllong commented 5 years ago

@RDmitchell -- can you verify that this is still up for discussion?

RDmitchell commented 5 years ago

Am getting feedback from users, will let you know

Ammon-R commented 5 years ago

@RDmitchell @nllong - Figured I'd chime in here rather than an email. I agree with Robin, removing the "required" column and leaving "not null" would probably be the best. From a benchmarking perspective, we are uploading the same data set over an over again and we know what fields are in the benchmarking data set that we are uploading. The "not null" field is more informative since it will tell us if there is an inherent issue with the data. (Example: benchmarks that don't have energy use data will have a null for the EUI but that EUI field will always be in the upload.) I haven't really used SEED for anything other than benchmarking so there might be a use case for the "required" field that I'm not aware of.

BenchmarkDC commented 5 years ago

Hi All,

Andrew from DC here. Chiming in to agree with Ammon for the exact same reason: we wouldn't upload a field for one record but exclude it for another, it's always the same spreadsheet. It might be useful to keep around in the case that a user has to upload a spreadsheet that is generated by the user (i.e. data verification forms created by a reporter)?

To add to this: Does "not null" capture values that are 0? Cause right now I can't set the minimum to 0 for number checks.

lainsworth8801 commented 4 years ago

@RDmitchell and all. My understanding on "required" v.s "not null" is that "required" items should always be not null. However, an item that is "not null" may not be "required" (more like "nice to have" depending on users per @BenchmarkDC). I propose that we keep both columns but when "required" is checked, "not null" is automatically checked (this should also capture/accept 0 value). For cases where a field value is marked as "not null" but NOT labeled as "required", or to filter out these fields, users can simply add labels to do the work.

RDmitchell commented 4 years ago

@lainsworth8801 -- and then users could also check "Not Null" but have the "Required field" not checked?

I would like to run this by Ammon and Andrew first -- I will send them the link to this issue and see if they have any comments.

BenchmarkDC commented 4 years ago

Hey chiming in here per @RDmitchell request. I think this makes a lot of sense, if it is the least effort move. We really don't see much use for the "Required field" currently but might see some use in the future in cases where we might have a public facing form that will not necessarily have a standard upload and we would want to check if required fields are missing.

Ammon-R commented 4 years ago

@RDmitchell I agree with Andrew, it could come in handy later down the line but we'd pretty much only use the "not null' for the current process.

RDmitchell commented 4 years ago

Required and Not Null are now 2 separate definitions in the DQ admin page, so 2 separate rules must be defined, one for each condition. Generally, just setting Not Null should be sufficient for files that are always the same (ESPM) and the fields are always there (so no need to make a rule that the field is required, although it is there so you can make that definition if you want to).

I am testing the new DQ functionality in general, but I think this issue can be closed because the two criteria are now separated.

Instance: dev1 SHA: 40ca210b (2.7.1)