Open samthiriot opened 6 years ago
Hey, I am not quite sure to understand the problem. If attribute is encoded as "1", "2" ... you can go for int attribute or if it is not integer value per se, for an ordered value attribute. If this is just another way to encode range attribute, then use a mapped attribute with a record mapper where you can define a mapping like: {1 : less than 10; 2 : 11 to 16 ...}. This option force you to define two attributes: referent range attribute and mapped "int to range" (or "ordered to range") attribute. Hope it can help you to overcome you issue.
thanks ! I think you understood my question ^^ The cases are, as you say:
<code of the variable>;<label of the variable>;<code of the modality (value)><label of the value>
thinking about it: typically to write the content of a value in a generated sample, one would like to also write the encoded value, not (always) the long version. in this case we need to be able to retrieve the short version (encoded) for a value; I'm not sure how to do it using a mapped attribute.
(I'll think about it, no worry & thanks)
Thinking about it and saw one good reason not to encode various codes for one attribute. In many case, data "simple" encoding like {"1", "2", ...} are used for several different attributes: e.g. boolean are 1 and 2; range are 1, 2 ... x and so on. Hence they can be confusion on translation: that is which "1" code will be related to which "complex" encoding ? The unique way to solve the problem is to bind modalities or codes. In that case, and if you use the mapped version of the attribute, you can choose between simple code or complex one (using DemographicAttribute#findMappedAttributeValues(IValue)
)
When we were creating Range attributes before the huge refactoring, it was possible to give both a list of codes (like "1","2"...) and the corresponding textual counterparts ("less than 10m","11 to 16"...). Now we are only constructing these ranges with the textual version. This works well to read aggregate stats from CSV files where we expect all the columns to explicitly contain "less than 10m"; but for sample files, the values is often encoded as "1","2"...
is it still possible to deal with that?
Tks !