gbif / pipelines

Pipelines for data processing (GBIF and LivingAtlases)
Apache License 2.0
40 stars 28 forks source link

Vocabs #1093

Closed marcos-lg closed 1 month ago

marcos-lg commented 1 month ago

I was using strings in the geological range by mistake since they should be floats to be able to do hive queries. I changed it in all the avros for consistency and I changed it in occurrence: https://github.com/gbif/occurrence/commit/47d9397e44661f11d94d9fb76223f99af29e3e8a

In Hive I kept it as double because we don't use floats and this way I can reuse the logic that query visitors and other classes do for parsing predicates and I think it should be fine but I could change it to doubles in avro if you think this can bring problems: https://github.com/gbif/occurrence/blob/a0c133df1bbba894efbd409b4aac28db3acdfdb8/occurrence-hdfs-table/src/main/java/org/gbif/occurrence/download/hive/HiveDataTypes.java#L54

Maybe we can reinterpret first this dataset so I can check that the ES searches keep working with this change: https://registry.gbif-uat.org/dataset/1f2cfb6f-c91b-498e-80f3-8eeeec688292