andrie / sss

R package to import files in the triple-s (Standard Survey Structure) format.
http://andrie.github.io/sss/
8 stars 5 forks source link

Data label issue with <text> modules in xml code #9

Closed ejuliaschulz closed 4 years ago

ejuliaschulz commented 5 years ago

Hello, thank you for this great package. I figured a problem with parsing the metadata for my triple-s file. It is just affecting the data labels for my variables. My xml code looks like this for variable declaration:

... <variable ident="5" type="single" format="numeric"> <name>typtel</name> <label> <text xml:lang="fr-FR" mode="analysis">TYPTEL-Telephone Type</text> </label> <position start="16" /> <values> <value code="1">Fixed<text xml:lang="fr-FR" mode="analysis">Fixed</text></value> <value code="2">Mobile<text xml:lang="fr-FR" mode="analysis">Mobile</text></value> </values> </variable> ...

I can parse the xml code, but in R, I can see that this in fact returns a code dataframe with invalid entries:

ident code codevalues
5 1 Fixed
5 2 <text xml:lang="fr-FR" mode="analysis">Fixed</text>
5 1 Mobile
5 2 <text xml:lang="fr-FR" mode="analysis">Mobile</text>

Is there a way to solve this with the package, without having to delete all of the <text> entries (which gives the right results?)

Thank you for your help Julia

andrie commented 5 years ago

Please provide a minimally reproducible example, ideally with small example files that exhibit this behaviour.

ejuliaschulz commented 5 years ago

Sure, please find an example attached.

testsss.zip

andrie commented 4 years ago

I'm working on a fix for this and this fix is currently in the dev branch

andrie commented 4 years ago

This fix is now in the master branch on github. Please can you test on as many survey files as possible and report any problems? I'll submit a new version to CRAN early next week if I don't get any error reports.