gbif / tech-docs

This is an integrated technical documentation site for GBIF.org
https://techdocs.gbif.org
0 stars 3 forks source link

Mention the final version of the SQL query to be used in a `query.json` for species cube #97

Closed damianooldoni closed 2 weeks ago

damianooldoni commented 2 months ago

Feedback on Species occurrence cubes (Github source for this page)

Hi 👋

Based on my own tests and the Slack conversation between @MattBlissett and @beukueb on the B3 channel this summer, I think we need an extra example on how to prepare the SQL query for being placed in a query.json file.

Important things to mention:

{
  "sendNotification": true,
  "notificationAddresses": [
    "userEmail@example.org" 
  ],
  "format": "SQL_TSV_ZIP",
  "sql": "
  SELECT
    PRINTF('%04d-%02d', \"year\", \"month\") AS yearMonth,
    GBIF_EEARGCode(
      1000,
      decimalLatitude,
      decimalLongitude,
      COALESCE(coordinateUncertaintyInMeters, 1000)
    ) AS eeaCellCode,
    familyKey,
    family,
    speciesKey,
    species,
    COALESCE(sex, 'NOT_SUPPLIED') AS sex,
    COALESCE(occurrence.lifestage.concept, 'NOT_SUPPLIED') AS lifestage,
    COUNT(*) AS occurrences,
    MIN(COALESCE(coordinateUncertaintyInMeters, 1000)) AS minCoordinateUncertaintyInMeters,
    MIN(GBIF_TemporalUncertainty(eventDate)) AS minTemporalUncertainty,
    IF(ISNULL(familyKey), NULL, SUM(COUNT(*)) OVER (PARTITION BY familyKey)) AS familyCount
  FROM
    occurrence
  WHERE occurrenceStatus = 'PRESENT'
    AND countryCode = 'PL'
    AND \"year\" >= 2000
    AND kingdomKey = 1
    AND hasCoordinate = TRUE
    AND (coordinateUncertaintyInMeters <= 1000 OR coordinateUncertaintyInMeters IS NULL)
    AND speciesKey IS NOT NULL
    AND NOT ARRAY_CONTAINS(issue, 'ZERO_COORDINATE')
    AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_OUT_OF_RANGE')
    AND NOT ARRAY_CONTAINS(issue, 'COORDINATE_INVALID')
    AND NOT ARRAY_CONTAINS(issue, 'COUNTRY_COORDINATE_MISMATCH')
    AND \"month\" IS NOT NULL
  GROUP BY
    yearMonth,
    eeaCellCode,
    familyKey,
    family,
    speciesKey,
    species,
    sex,
    lifestage
  ORDER BY
    yearMonth DESC,
    eeaCellCode ASC,
    speciesKey ASC;
  "
}

I would also mention that the name query.json is not mandatory. The user can give a more meaningful name to the json file. This should be mentioned asap, so something for the API SQL Download.

If I have other ideas, I will make new issues or I will add a comment here. Thanks in advance already for the great job GBIF is doing! 💪

MattBlissett commented 2 weeks ago

Thanks for the suggestions Damiano, I've updated the documentation. It might be a few days before it shows on techdocs.gbif.org, as some other updates are also being made to other sections. (It's on techdocs.gbif-uat.org.)