ANRGenstar / genstar

Generation of Synthetic Populations Library
20 stars 2 forks source link

can we still encode Range values in samples? #49

Open samthiriot opened 6 years ago

samthiriot commented 6 years ago

When we were creating Range attributes before the huge refactoring, it was possible to give both a list of codes (like "1","2"...) and the corresponding textual counterparts ("less than 10m","11 to 16"...). Now we are only constructing these ranges with the textual version. This works well to read aggregate stats from CSV files where we expect all the columns to explicitly contain "less than 10m"; but for sample files, the values is often encoded as "1","2"...

is it still possible to deal with that?

Tks !

chapuisk commented 6 years ago

Hey, I am not quite sure to understand the problem. If attribute is encoded as "1", "2" ... you can go for int attribute or if it is not integer value per se, for an ordered value attribute. If this is just another way to encode range attribute, then use a mapped attribute with a record mapper where you can define a mapping like: {1 : less than 10; 2 : 11 to 16 ...}. This option force you to define two attributes: referent range attribute and mapped "int to range" (or "ordered to range") attribute. Hope it can help you to overcome you issue.

samthiriot commented 6 years ago

thanks ! I think you understood my question ^^ The cases are, as you say:

samthiriot commented 6 years ago

thinking about it: typically to write the content of a value in a generated sample, one would like to also write the encoded value, not (always) the long version. in this case we need to be able to retrieve the short version (encoded) for a value; I'm not sure how to do it using a mapped attribute.

samthiriot commented 6 years ago

(I'll think about it, no worry & thanks)

chapuisk commented 6 years ago

Thinking about it and saw one good reason not to encode various codes for one attribute. In many case, data "simple" encoding like {"1", "2", ...} are used for several different attributes: e.g. boolean are 1 and 2; range are 1, 2 ... x and so on. Hence they can be confusion on translation: that is which "1" code will be related to which "complex" encoding ? The unique way to solve the problem is to bind modalities or codes. In that case, and if you use the mapped version of the attribute, you can choose between simple code or complex one (using DemographicAttribute#findMappedAttributeValues(IValue))