PASTAplus / dex

Explore and subset CSV tables using associated EML metadata
Apache License 2.0
3 stars 0 forks source link

Sub-setting should be based on a simplified query language #8

Closed servilla closed 3 years ago

servilla commented 3 years ago

The data table sub-setting functionality should be based on a simple query syntax language (/Backus–Naur form context free grammar) that can easily be executed by the underlying Pandas Python package in lieu of multiple selection tables.

As it turns out, Pandas already supports a query language that may be used for data table filtering. An example of such a query is against the Data Carpentries Python ecology surveys.csv table:

df.query("(year == 1990 | year == 1991) & sex == 'M' & (species_id == 'BA' | species_id == 'RM' | species_id == 'DO')")

servilla commented 3 years ago

The current implementation relies on Panda's query syntax. Albeit not "simple", appears to be effective and moderately powerful.