Open edbennett opened 2 years ago
Yes, that is definitely something we need. Question is: Do we consider this as pure csv manipulation, i.e. do we just want to do parser.remove_columns(content: str, columns: Iterable[str]) -> str
to be written back to a csv file, or do we need to be able to write a consistent csv from a parsed pd.DataFrame
, i.e. parser.write_csv(parser.parse_questions(content))
or the like.
The latter would be nice but is not essential; if we're not pursuing development long-term then I'd go with the former.
The former probably doesn't even need a wrapper, or at least not as part of the parser. Wouldn't it just be something like
parsing_options = {'sep': ';' , ...}
excluded = ['name1', 'name2', ...]
(lambda df: df[[col not in excluded for col in df.columns]])(
pd.read_csv(filename, **parsing_options)
).write_csv(**parsing_options)
Okay. You might want to give it a name. But it should probably just be a free function.
Yes, no reason to not be a free function. I guess I was hoping for something that didn't re-read the CSV, but that's not essential.
Worth noting that each question gets multiple columns (including e.g. time spent on the question as well as the response itself)—would be nice to pass a list of questions rather than having to identify all columns associated with each question.
Yeah, okay. Then, you would probably want a bit of infrastructure around that. Concerning the csv reads: Do you really think that we could ever get a significant amount of answers for us to note the time difference between 100 and 1000 reads?
Do you really think that we could ever get a significant amount of answers for us to note the time difference between 100 and 1000 reads?
No, it was entirely a matter of elegance.
It would be useful to be able to strip out particular questions from the survey (including both data and metadata about them), and write out a new CSV compatible with the original but with those columns removed.