Open trnstlntk opened 2 years ago
This is because your expression returns an array, not a single value, and arrays are silently discarded when creating columns out of expressions: https://github.com/OpenRefine/OpenRefine/issues/1088
(see also https://github.com/OpenRefine/OpenRefine/issues/4823, which would be one of my preferred ways to improve this)
Concretely, what you can do on your side is use extractTemplate(value, "Information", "Description")[0]
.
Or we could decide that this extractFromTemplate
function should not return an array, but only its first result. That makes it impossible to fetch other results, in cases where there are more than one matches, so from a programmer's perspective it is a bit disappointing, but perhaps you want to prioritize having a simpler expression.
You could do the same extractCategories
and it would only return the first category of the page - that sounds even worse than for extractFromTemplate
since files routinely have multiple categories and there is no reason why the first one should be more interesting than the others, so intuitively it is worth explaining to users that arrays exist and how to deal with them, but that's my very biased programmer perspective :-P
Or we could decide that this
extractFromTemplate
function should not return an array, but only its first result. That makes it impossible to fetch other results, in cases where there are more than one matches, so from a programmer's perspective it is a bit disappointing, but perhaps you want to prioritize having a simpler expression.You could do the same
extractCategories
and it would only return the first category of the page - that sounds even worse than forextractFromTemplate
since files routinely have multiple categories and there is no reason why the first one should be more interesting than the others, so intuitively it is worth explaining to users that arrays exist and how to deal with them, but that's my very biased programmer perspective :-P
I'm finally getting around to documenting this. I will go for the pragmatic approach, providing end users with easy-to-reuse recipes, as I'm noticing that onboarding / learning the whole OpenRefine workflow is already pretty challenging for average Wikimedians.
As an exercise, I tried to come up a workaround myself which will be helpful for others too, but I'm not sure yet if I found the smartest solution. Is something like value.extractCategories()[0,10].toString()
a decent workaround, or would you recommend something even nicer? (The 0-10
to catch a lot of values; and the toString
to circumvent the 'OpenRefine won't do arrays in cells' issue.)
I would recommend more something like value.extractCategories().join('#')
which should join categories with a #
symbol between them, such as Category:Art#Category:Spain#Category:Blue
. Then, users can easily split those values into multiple cells / columns using the corresponding functions in OpenRefine.
I have been trying the
extractFromTemplate
andvalue.extractCategories
GREL functions in various projects. Both work well in the GREL preview dialog window:But then after clicking OK, in the project itself, both produce an empty column. I haven't been able to get it to work in any project for now, but just for testing purposes, here's a project in which it went wrong: Barbalissos.openrefine.tar.gz