Parse both non-tidy and tidy input files

LucjanJanowski / translator-to-suJSON

Read subjective experiment data to suJSON format.

MIT License

2 stars 1 forks source link

Parse both non-tidy and tidy input files #9

Open Qub3k opened 5 years ago

Qub3k commented 5 years ago

Currently, we assume that the input XLS file is non-tidy. In other words, we expect it to have one column with scores for each tester. Since we want to have both the input and output of suJSON tools operating on tidy data, this must change.

For this to work we need some flag (or internal mechanism) indicating whether we are dealing with tidy or non-tidy data.

EDIT: Following the @slhck remarks, we first focus on using some heuristics to detect whether the input is tidy or non-tidy. For details, go to his comment.

slhck commented 5 years ago

Can we apply some heuristics? If the input file has a "score" or "rating" column, it will most likely be tidy (i.e., subject–pvs–score). If not, and if the columns contain sequential numbers (e.g., "s1", "s2", …) it will be a "long" dataframe.

(Sorry for not being able to provide any conversion code until now, should be easier after QoMEX.)

Qub3k commented 5 years ago

Sounds good to me. I prefer to have some heuristics rather than another configuration parameter. What is more, I agree it should be relatively straight forward to detect a "long" dataframe.

Let me then update the description to underline that heuristics is our primary target.

slhck commented 5 years ago

Yep. The way I'd handle it is:

if any(x in csv_columns for x in ["rating", "score"]):
    csv_format = "long"
else:
    print("Warning: it appears you are using a wide data format, as no 'rating' or 'score' column was found")
    csv_format = "wide"

if csv_format == "wide":
    # apply some magic to find the columns with subjects, e.g. starting with "subject" or "s" or "user"

My guess is: most people will have a wide format lying around somewhere.

Qub3k commented 4 years ago

@matix7290 please take a look at this new TODO. https://github.com/LucjanJanowski/translator-to-suJSON/blob/a559e947d3ed4c79671fc8e03ca7de4c2788a36b/sujson/_sujson.py#L243-L244

Qub3k commented 4 years ago

@matix7290 Thanks for completing previous TODOs! 🎉

Here is a list of new TODOs. 😅 https://github.com/LucjanJanowski/translator-to-suJSON/blob/9d9b27419a11a05101de42480332bc3d58fce4b4/sujson/_sujson.py#L233-L243 https://github.com/LucjanJanowski/translator-to-suJSON/blob/9d9b27419a11a05101de42480332bc3d58fce4b4/sujson/_sujson.py#L257-L258 https://github.com/LucjanJanowski/translator-to-suJSON/blob/9d9b27419a11a05101de42480332bc3d58fce4b4/sujson/_sujson.py#L270-L280 https://github.com/LucjanJanowski/translator-to-suJSON/blob/9d9b27419a11a05101de42480332bc3d58fce4b4/sujson/_sujson.py#L288-L296

Qub3k commented 3 years ago

Just to keep this thread up to date: matix7290 did address TODOs from the previous message. Now, the ball is in my court. I shall review his code and merge it with the master branch.