brightway-lca / brightway2-data

Tools for the management of inventory databases and impact assessment methods. Part of the Brightway LCA framework.
https://docs.brightway.dev/
BSD 3-Clause "New" or "Revised" License
11 stars 24 forks source link

Improve biosphere search #82

Open BenPortner opened 3 years ago

BenPortner commented 3 years ago

Description

Currently there is no good way to filter biosphere flows by category. For example, there is currently no way to use the Database.search method to limit results to the category ('air',) when searching for 'Carbon dioxide, fossil'. Filtering for ('air',) yields no result. Filtering for 'air' returns all flows.

>>> bw.Database('biosphere3').search("carbon dioxide, fossil", filter={'categories':'air'})
Excluding 0 filtered results
['Carbon dioxide, fossil' (kilogram, None, ('air', 'low population density, long-term')), 
'Carbon dioxide, fossil' (kilogram, None, ('air', 'urban air close to ground')), 
'Carbon dioxide, fossil' (kilogram, None, ('air', 'lower stratosphere + upper troposphere')), 
'Carbon dioxide, fossil' (kilogram, None, ('air',)), 
'Carbon dioxide, fossil' (kilogram, None, ('air', 'non-urban air or from high stacks')), 
'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'low population density, long-term')), 
'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'non-urban air or from high stacks')), 
'Carbon dioxide, non-fossil' (kilogram, None, ('air',)), 
'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'urban air close to ground')), 
'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'lower stratosphere + upper troposphere')), 
'Carbon dioxide, non-fossil, from calcination' (kilogram, None, ['air'])]

Proposal

If the filter expression allowed brackets and commas, then one could filter for ('air',) to get the exact result. This requires a custom whoosh analyzer in addition to the default analyzer. The custom analyzer is necessary because the default one removes punctuation from the index entries. The default analyzer is necessary so that filtering for 'air' (without brackets and comma) still yields the expected result. The solution also requires an update of IndexManager._format_dataset() to write a stringified version of the tuple including brackets to the search index.

BenPortner commented 3 years ago

loosely related to #63

cmutel commented 3 years ago

This sounds fine, but I won't do it anytime soon. Labelling as enhancement and assigning it to you.