ChuBL / OpenMindat

A Python package for the OpenMindat API.
https://pypi.org/project/openmindat/
Apache License 2.0
4 stars 2 forks source link

0 used as missing data placeholder #6

Open fros1y opened 8 months ago

fros1y commented 8 months ago

I've noticed that, for many fields, 0 is being used as a missing data representation. For latitude and longitude, however, this is particularly problematic, since 0N0E is a real place! (https://en.wikipedia.org/wiki/Null_Island). Testing for zero equality is also troublesome, since equality is ill-defined over floating point values (such as latitude and longitude).

Can the API return None or some other marker for these missing values instead of 0, or does the underlying data have the same ambiguity?

ChuBL commented 8 months ago

This is a good point. The 0 values are rooted in the databases, and I have passed this issue on to the developing teams. Hopefully, we can eliminate these annoying 0s in future versions.

ChromiteExabyte commented 6 months ago

For placeholders in the MySQL database, it is appropriate to replace 0 with the special datatype null or with an empty string ''. The choice depends on what is "meant" by the database

Unknown lat/long values are most accurately NULL; while it is certain that localities have a spatial reference, it is not known for that record.

My understanding is Mindat's aim is to be a repository of mineral properties / attributes / etc first. Under "Open Geoscience Data", there are tools for "GeoCODES and "DataONE" for locality data.

Cleansing data is a key part of the data science and data handling; there are many memes regarding the subject. For now, users can script out solutions: "if lat = 0, assign NULL to lat".

Source:

MySQL Reference Manual, Section B.3.4.3 Problems with NULL Values https://dev.mysql.com/doc/refman/8.3/en/problems-with-null.html

ChuBL commented 6 months ago

For placeholders in the MySQL database, it is appropriate to replace 0 with the special datatype null or with an empty string ''. The choice depends on what is "meant" by the database

  • If a value is not known and is sure not to exist, it is most accurate to have an empty string '' as a field value.
  • If a value is not known but is presumed to exist, it is most accurate to have the NULL datatype as a field datatype.

Unknown lat/long values are most accurately NULL; while it is certain that localities have a spatial reference, it is not known for that record.

My understanding is Mindat's aim is to be a repository of mineral properties / attributes / etc first. Under "Open Geoscience Data", there are tools for "GeoCODES and "DataONE" for locality data.

Cleansing data is a key part of the data science and data handling; there are many memes regarding the subject. For now, users can script out solutions: "if lat = 0, assign NULL to lat".

Source:

MySQL Reference Manual, Section B.3.4.3 Problems with NULL Values https://dev.mysql.com/doc/refman/8.3/en/problems-with-null.html

Noted, thank you for the advice and reference. I will forward your message to the Mindat database administrators.