In Agribalyse we have activity variants whose name differs just by a single character:
>>> pprint(bw2data.Database("Agribalyse 3.1.1").search("Sunflower grain organic"))
['Sunflower grain, organic, system number 4, at farm gate {FR} U' (kilogram, None, None),
'Sunflower grain, organic, system number 1, at farm gate {FR} U' (kilogram, None, None),
'Sunflower grain, organic, system number 2, at farm gate {FR} U' (kilogram, None, None),
'Sunflower grain, organic, system number 5, at farm gate {FR} U' (kilogram, None, None),
'Sunflower grain, organic, system number 3, at farm gate {FR} U' (kilogram, None, None),
'Sunflower grain, organic, system number 1, at farm gate {FR} U' (kilogram, None, ('Materials/fuels',)),
'Sunflower grain, organic, system number 4, at farm gate {FR} U' (kilogram, None, ('Materials/fuels',)),
'Sunflower grain, organic, system number 3, at farm gate {FR} U' (kilogram, None, ('Materials/fuels',)),
'Sunflower grain, organic, system number 2, at farm gate {FR} U' (kilogram, None, ('Materials/fuels',)),
'Sunflower grain, organic, system number 5, at farm gate {FR} U' (kilogram, None, ('Materials/fuels',))]
For ecobalyse we need to select the right activity without depending on the activity identifier, which may be different depending on the software, database, and even database version. Doing so is also more future-proof because we won't depend on the changing code if the database is upgraded. The idea is to keep a search term as the reference of an activity, instead of the identifier/code. So from the above list, say we want to select Sunflower grain, organic, system number 3, at farm gate.
With the default setup of Brightway, it's not possible to search this exact activity, because the underlying search engine defaults to ignore single characters and prevents from specifying 'system number 3' in quotes:
>>> pprint(db.search('Sunflower grain organic system number 3'))
['Sunflower grain, organic, system number 5, at farm gate {FR} U' (kilogram, None, None),
'Sunflower grain, organic, system number 4, at farm gate {FR} U' (kilogram, None, None),
'Sunflower grain, organic, system number 1, at farm gate {FR} U' (kilogram, None, None),
'Sunflower grain, organic, system number 3, at farm gate {FR} U' (kilogram, None, None),
'Sunflower grain, organic, system number 2, at farm gate {FR} U' (kilogram, None, None)]
>>> len(db.search('"Sunflower grain organic system number 3"'))
5
>>> len(db.search("'Sunflower grain organic system number 3'"))
5
>>> len(db.search("name:'Sunflower grain organic system number 3'"))
5
There is a default StopFilter in whoosh that prevents to search single character words. For the name field of activities I think it would be relevant to completely remove the StopFilter and it's minimum size, because every single character may be relevant is an activity name.
What do you think?
I've tried the following:
In the bw2schema, just replace the name field with: name=TEXT(stored=True, sortable=True, analyzer=StandardAnalyzer(stoplist=None, minsize=1))
Then it's possible to search the exact activity:
>>> db.search("Sunflower grain organic system 3")
['Sunflower grain, organic, system number 3, at farm gate {FR} U' (kilogram, None, None),
'Sunflower grain, organic, system number 3, at farm gate {FR} U' (kilogram, None, ('Materials/fuels',))]
(The last two choices if needed can be disambiguated by specifying the category or the code in the search term in last resort)
@ccomb Fantastic issue report. Please submit a PR with the fix you have already done, and a test; I will port it to whatever branch you don't use and release a new bw2data version.
In Agribalyse we have activity variants whose name differs just by a single character:
For ecobalyse we need to select the right activity without depending on the activity identifier, which may be different depending on the software, database, and even database version. Doing so is also more future-proof because we won't depend on the changing code if the database is upgraded. The idea is to keep a search term as the reference of an activity, instead of the identifier/code. So from the above list, say we want to select
Sunflower grain, organic, system number 3, at farm gate
.With the default setup of Brightway, it's not possible to search this exact activity, because the underlying search engine defaults to ignore single characters and prevents from specifying
'system number 3'
in quotes:There is a default
StopFilter
inwhoosh
that prevents to search single character words. For thename
field of activities I think it would be relevant to completely remove the StopFilter and it's minimum size, because every single character may be relevant is an activity name.What do you think?
I've tried the following: In the
bw2schema
, just replace the name field with:name=TEXT(stored=True, sortable=True, analyzer=StandardAnalyzer(stoplist=None, minsize=1))
Then it's possible to search the exact activity:
(The last two choices if needed can be disambiguated by specifying the category or the code in the search term in last resort)