Open rbyh opened 1 month ago
In this example a have responses to a QuestionCheckBox
question which is a list of strings. When I convert the results to a dataframe the lists of selected options are converted into strings that look like lists
@rbyh Can you investigate best practices with pandas here? Pandas is meant to be a 'flat' format, so don't know what we should be doing.
Pandas should preserve the format, eg, here a column that is lists of strings remains in this format:
import pandas as pd
# Example DataFrame with a list in a column
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Interests': [['reading', 'cycling'], ['painting'], ['writing', 'cooking']]}
df = pd.DataFrame(data)
type(df['Interests'][0])
Will return:
<class 'list'>
I think the issue is the intermediary CSV conversion steps in to_pandas()
. I think we can skip them with this fix:
import pandas as pd
import io
def to_pandas(self, remove_prefix: bool = False) -> pd.DataFrame:
"""Convert the results to a pandas DataFrame, ensuring that lists remain as lists.
:param remove_prefix: Whether to remove the prefix from the column names.
"""
df = pd.DataFrame(self.data)
if remove_prefix:
# Optionally remove prefixes from column names
df.columns = [col.split('.')[-1] for col in df.columns]
df_sorted = df.sort_index(axis=1) # Sort columns alphabetically
return df_sorted
It's a good fix but it broke some other tests in a complicated way, so I'm not quite ready to implement.
Bumping this. As I'm working on examples for extracting themes and turning them into checkbox question options I am frequently needing to add a step transforming the list-as-string into a true list.