NASA-IMPACT / veda-data

4 stars 0 forks source link

[WIP] Define best practices for defining collections when produced for specific stories #58

Open abarciauskas-bgse opened 2 years ago

abarciauskas-bgse commented 2 years ago

The question has arisen for datasets we are curating specifically to tell the EJ story of Hurricanes Ida and Maria: how do we want to organize collections that are produced for specific events (e.g. Hurricanes). The consensus so far is that we want to publish to generalizable collections as much as possible rather than collections scoped to a specific event.

What this means is that for the EJ story we will be creating and using the following collections:

Interested in any additional thoughts or considerations from the team @anayeaye @slesaad @xhagrg @leothomas @danielfdsilva

sharkinsspatial commented 2 years ago

@abarciauskas-bgse From the STAC perspective I would recommend following the consensus decision you referenced of using only generalizable collections. This avoids conflating collection definition with a specific story and seems more in spirit with the STAC collection specification.

This seems like a best practice but presents another issue, where to store story specific filters to access only the items in the general collection which are relevant to the story? I'd suggest that using some type of filter key (which contains a valid CQL query body) in the UI config described here might be one option. This has 3 advantages

  1. The ranges and filter configuration for your story would be managed in a static file under version control so changes or new story releases could be driven by application CI.
  2. Using a filter means that as new data is added to your STAC API it is immediately reflected in the application without the need for loading data into a specific collection.
  3. Data in a generalized collection can participate in multiple stories and still seem semantically correct.
anayeaye commented 2 years ago

@abarciauskas-bgse @sharkinsspatial @xhagrg the temporal cadence of the hurricane event nightlights data is a unique case:

Nightlights events test metadata with start/end Nightlights monthly hd with nominal datetime

abarciauskas-bgse commented 2 years ago

I sent Ranjay an email about the temporal nature of the BMHD monthly files - I think if he can verify that the start_ and end_datetime of the monthly data files can be used for the nightlights-hd-monthly we should put the Hurricane files in that same collection. The downside of this is that the collection has to be described as having a dashboard:time_density of "multi-day" instead of "month" because the temporal extent of the files for the Hurricanes is greater than a month, not a month. @danielfdsilva @anayeaye will that will be problematic for the dashboard?

I believe for Ida there is just a day before the hurricane 2021-08-09 and a day after the hurricane 2021-08-31, so I think just a single datetime does make sense for those items.

xhagrg commented 2 years ago

@abarciauskas-bgse @anayeaye after reading the response from Ranjay, it looks like we can just use the start and end date time of the corresponding month? Do we move ahead with ingestion of these files in the same collection?

abarciauskas-bgse commented 2 years ago

Yes thanks @xhagrg for checking, I think we should consolidate in the nightlights-hd-monthly dataset and also add start_ and end_datetimes to the existing COGs for the month of each file. However we can create a new issue to do that.

abarciauskas-bgse commented 2 years ago

Also Ranjay shared these links as the product pages for the dataset: https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/VNP46A3/, https://ladsweb.modaps.eosdis.nasa.gov/missions-and-measurements/products/VNP46A4/

@anayeaye @sharkinsspatial is there a good place for these types of references in the STAC collection metadata? I'm checking with Ranjay but my guess that those are the product pages for the source HDF5 used to generate the COGs we have.

anayeaye commented 2 years ago

@abarciauskas-bgse @sharkinsspatial @xhagrg RE: datetimes I don't have a specific reference to back this up but I think it is probably a good idea to choose to use datetime OR start/end for all items in the collection. It's a bit of a stretch but if user is paging through a collection of items for a search they should be able to expect datetime information on the same property for all items in the response. I don't think pgstac would have any trouble with the search on a collection with mixed datetime properties but I'm also not sure how we would communicate this information to the end user. Mixed datetimes could also complicate using other stac-apis for these items in the future.

I also don't want to block ingest on the nightlights data--I think it will be fine either way because it is small enough to easily refactor or reingest if needed.

anayeaye commented 2 years ago

@abarciauskas-bgse Just refreshed and saw the Collection level metadata question above. I think these references would be good links to add to the document. This HLS delta collection has external links to metadata, maybe we could follow this pattern: https://dev-stac.delta-backend.xyz/collections/HLSS30.002

  "links": [
    <SNIP>
    {
      "rel": "external",
      "href": "https://cmr.earthdata.nasa.gov/search/concepts/C2021957295-LPCLOUD.html",
      "type": "text/html",
      "title": "NASA Common Metadata Repository Record for this Dataset"
    }
  ]
abarciauskas-bgse commented 2 years ago

@xhagrg per @anayeaye's comment about start_ and enddatetime, I think we will want to include start and end_datetime for Ida files. Sorry for the re-work. I see those files are already published to https://dev-stac.delta-backend.xyz/collections/BMHD/items

xhagrg commented 2 years ago

@abarciauskas-bgse I will be using the "nightlights-hd-monthly" collection which already exists. will be adding the start_ and end_datetime in the properties. Do we retain the datetime field? or set it to none as done previously?

abarciauskas-bgse commented 2 years ago

datetime is required field (see https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#datetime) but setting it to null is acceptable if start_ and end_datetime s are specified

gadomski commented 1 year ago

Moving this to veda-data as it's a good/useful conversation, and we're sunsetting this repo: https://github.com/NASA-IMPACT/veda-architecture/issues/322.