DARPA-ASKEM / .github-private

0 stars 0 forks source link

[PLAN] Data #5

Closed YohannParis closed 1 year ago

YohannParis commented 1 year ago

Explorer

Papers

Data Service

Stakeholders

brandomr commented 1 year ago

@YohannParis @dgauldie this looks good. The work we'll be doing related to the Data Service this sprint involves trying to refine the storage format/schema and trying to determine how best to represent geospatial information.

Currently we store CSVs in this format but want to assess whether we can rely on NetCDF instead. The primary reason for this is that we want to maintain performance as we move towards larger datasets for space weather AND we know that the TA3 teams are going to be using NetCDF as well. The challenge is that many COVID datasets are not gridded; instead they've got categorical place names (e.g. "Travis County, Texas, United States"). Additionally, we would like to be able to seamlessly combine features across datasets.

@mattprintz and @Sorrento110 are working on this analysis. Generally speaking, we can export data from the Data Service's internal format to whichever format the HMI prefers.

amostafa-uncharted commented 1 year ago

With regards to searching for Tables and Figures, an existing implementation that utilizes COSMOS is prototyped here: https://xdd.wisc.edu/set_visualizer/sets/askem-covid-demo/?query=vaccine&type=Figure but the current implementation for fetching Figures and Tables as Extractions is done through the new xDD ASKEM API. This recently evolving API only supports retrieving extractions for a specific document DOI and very recently has introduced a query param that seems to support extraction search, i.e. "query_all" (hence it is experimental and probably not working properly).

In addition to the aforementioned note, the data explorer will need a large plumping effort to get this feature added. I would confirm the priority of this feature and perhaps ask for a rough design first to decide on things such as the content of the matrix view and facets, which won't be available if this feature is implemented! @pascaleproulx fyi