Ability to remove questions and write out data

chillenzer / limesurvey_parser

A small parser for LimeSurvey generated CSV files.

MIT License

0 stars 1 forks source link

Ability to remove questions and write out data #5

Open edbennett opened 2 years ago

edbennett commented 2 years ago

It would be useful to be able to strip out particular questions from the survey (including both data and metadata about them), and write out a new CSV compatible with the original but with those columns removed.

chillenzer commented 2 years ago

Yes, that is definitely something we need. Question is: Do we consider this as pure csv manipulation, i.e. do we just want to do parser.remove_columns(content: str, columns: Iterable[str]) -> str to be written back to a csv file, or do we need to be able to write a consistent csv from a parsed pd.DataFrame, i.e. parser.write_csv(parser.parse_questions(content)) or the like.

edbennett commented 2 years ago

The latter would be nice but is not essential; if we're not pursuing development long-term then I'd go with the former.

chillenzer commented 2 years ago

The former probably doesn't even need a wrapper, or at least not as part of the parser. Wouldn't it just be something like

parsing_options = {'sep': ';' , ...}
excluded = ['name1', 'name2', ...]
(lambda df: df[[col not in excluded for col in df.columns]])(
        pd.read_csv(filename, **parsing_options)
    ).write_csv(**parsing_options)

Okay. You might want to give it a name. But it should probably just be a free function.

edbennett commented 2 years ago

Yes, no reason to not be a free function. I guess I was hoping for something that didn't re-read the CSV, but that's not essential.

Worth noting that each question gets multiple columns (including e.g. time spent on the question as well as the response itself)—would be nice to pass a list of questions rather than having to identify all columns associated with each question.

chillenzer commented 2 years ago

Yeah, okay. Then, you would probably want a bit of infrastructure around that. Concerning the csv reads: Do you really think that we could ever get a significant amount of answers for us to note the time difference between 100 and 1000 reads?

edbennett commented 2 years ago

Do you really think that we could ever get a significant amount of answers for us to note the time difference between 100 and 1000 reads?

No, it was entirely a matter of elegance.