Currently there is no good way to filter biosphere flows by category. For example, there is currently no way to use the Database.search method to limit results to the category ('air',) when searching for 'Carbon dioxide, fossil'. Filtering for ('air',) yields no result. Filtering for 'air' returns all flows.
>>> bw.Database('biosphere3').search("carbon dioxide, fossil", filter={'categories':'air'})
Excluding 0 filtered results
['Carbon dioxide, fossil' (kilogram, None, ('air', 'low population density, long-term')),
'Carbon dioxide, fossil' (kilogram, None, ('air', 'urban air close to ground')),
'Carbon dioxide, fossil' (kilogram, None, ('air', 'lower stratosphere + upper troposphere')),
'Carbon dioxide, fossil' (kilogram, None, ('air',)),
'Carbon dioxide, fossil' (kilogram, None, ('air', 'non-urban air or from high stacks')),
'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'low population density, long-term')),
'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'non-urban air or from high stacks')),
'Carbon dioxide, non-fossil' (kilogram, None, ('air',)),
'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'urban air close to ground')),
'Carbon dioxide, non-fossil' (kilogram, None, ('air', 'lower stratosphere + upper troposphere')),
'Carbon dioxide, non-fossil, from calcination' (kilogram, None, ['air'])]
Proposal
If the filter expression allowed brackets and commas, then one could filter for ('air',) to get the exact result. This requires a custom whoosh analyzer in addition to the default analyzer. The custom analyzer is necessary because the default one removes punctuation from the index entries. The default analyzer is necessary so that filtering for 'air' (without brackets and comma) still yields the expected result. The solution also requires an update of IndexManager._format_dataset() to write a stringified version of the tuple including brackets to the search index.
Description
Currently there is no good way to filter biosphere flows by category. For example, there is currently no way to use the
Database.search
method to limit results to the category('air',)
when searching for'Carbon dioxide, fossil'
. Filtering for('air',)
yields no result. Filtering for'air'
returns all flows.Proposal
If the filter expression allowed brackets and commas, then one could filter for
('air',)
to get the exact result. This requires a custom whoosh analyzer in addition to the default analyzer. The custom analyzer is necessary because the default one removes punctuation from the index entries. The default analyzer is necessary so that filtering for'air'
(without brackets and comma) still yields the expected result. The solution also requires an update ofIndexManager._format_dataset()
to write a stringified version of the tuple including brackets to the search index.