Closed anfranken closed 8 years ago
Perhaps it is useful to add a suffix which shows that a variables is not available by any access way. This could be especially useful for variable which are the basis for generated variables. Users can only comprehend how a generated variable is constructed (meta data) if they have some information about the original (basis) variable. Parhaps for variables which are not accessible, only the labels and codes should be shown in the variable report. Otherwise there could be a danger of de-anonymisation. For example: Reasons for exemption from paying tuition fees are clumed (in the 19th social survey): downstream tuition fees and "other reasons" form a new group "other reasons (including downsteam tution fees"). If in the basis variable there is information about the size of the groups, for example: other: n=10 and downstream tuition fees: n =90, there is a high (90 %) probability that a person in the new group (n=100) belongs to the group "downstream tuition fees" (only students in Hamburg belong to this group!). Therefore, to reduce the risk of de-anonymisation, it could be better to just show the labels and codes of the original (basis) variable in the variable report and not the number of cases. Note: the reported number of cases are not the real ones; they are only an example.
Thank you very much for the comment @Sarcletti . Actually this is already on our Agenda. In the course of designing our datamanagement approach in terms of OAIS (in which the AIP represents our main dataproduct) we thought about referencing the Only-AIP-Variables with an _A in the suffix. I think this is quite similar to your suggestion. See also issue #409 .
We need to support: "download-cuf", "download-suf", "remote-desktop-suf", "onsite-suf", "not-accessible"
we have to change the controlled vocabulary of variable.accessWays to (for example):
But this can only be done, if it is determined which ways are possible.