Roche / pyreadstat

Python package to read sas, spss and stata files into pandas data frames. It is a wrapper for the C library readstat.
Other
322 stars 60 forks source link

Support for SPSS multiple response categories #25

Open Berndvanderwielen opened 5 years ago

Berndvanderwielen commented 5 years ago

Support to retrieve (meta) data on MRVs / multiple answer question groupings would be great.

ofajardo commented 5 years ago

Sorry, no idea what MRVs are. Can you please provide an example SPSS file and explain what it is and what information are you trying to retrieve?

Berndvanderwielen commented 5 years ago

The official name is "Multiple Response sets". URL: https://www.ibm.com/support/knowledgecenter/en/SSLVMB_23.0.0/spss/base/multiple_response_intro.html

If the SPSS documentation is not enough I can provide an example SPSS file.

ofajardo commented 5 years ago

Yes, a sample file will be needed. In addition a description of the contents in plain text, because the job is to guess where in the binary file is the content you are looking for.

This will require changes to the Readstat C library. I can file an issue over there or you can do it yourself if you prefer. There is no guarantee that they will do it, nor timelines either.

It will help a lot if there is somewhere a description of how are these fields represented in the binary file. If you could find that would be great, because otherwise it will be very difficult to implement. An example of such specification is here or here but these doesn't seem to explain the feature you are requesting (can you see them?) Also other libraries in python or other languages that can do the job could also be useful to look at.

ofajardo commented 5 years ago

Closed due to lack of example files.

SamMousa commented 3 years ago

@ofajardo, sorry for the ping, but I assume you're not receiving comments on closed issues.

Could this be reopened?

The specs for this record in the SPSS file are actually part of the spec you linked: https://www.gnu.org/software/pspp/pspp-dev/html_node/Multiple-Response-Sets-Records.html

Essentially what it does is specify the relationship between multiple questions that should be interpreted as a single question with multiple values instead. example.sav.zip

Attached is an example file that contains 2 sets, one multiple category and multiple dichotomy. For details you could check docs here (https://www.gnu.org/software/pspp/manual/html_node/MRSETS.html#MRSETS), but for implementation that is not relevant.

Let me know if I can be of further assistance, I'm not familiar with python at all, but have spent many hours hating the binary file format that is SPSS SAV...

ofajardo commented 3 years ago

hi there,

Haven't look into it in detail yet, but this will require that Readstat (the C library behind pyreadstat) implements this.

Could you therefore open an issue there? (I am sure they will appreciate your insights into the binary file format). Once it is implemented in Readstat I will be able to bring it into pyreadstat.

slobodan-ilic commented 6 months ago

I've opened a new PR to address this #259 . In accordance with our team at Crunch.io and Evan Miller. We'll also open a PR on readstat, so this won't be immediately available. The idea is to rebase the ☝️ pr once readstat changes get shipped.

arsoni20 commented 6 months ago

This feature will be hugely appreciated.

slobodan-ilic commented 1 month ago

This has now been implemented, as documented in this PR: https://github.com/Roche/pyreadstat/pull/271