LucjanJanowski / translator-to-suJSON

Read subjective experiment data to suJSON format.
MIT License
2 stars 1 forks source link

General Suggestions #1

Closed slhck closed 5 years ago

slhck commented 5 years ago

I have a few minor suggestions and random ideas for the repo / scripts. Thanks for your overall efforts! 🎉


It would be great if they were runnable in a standalone fashion and read the config based on an option. An example call could be:

SQL_to_suJSON.py -c config.json

Better yet, you could allow the user to specify the config options directly on the command line. During development and subjective testing, I frequently find myself changing file names, so perhaps having input and output configurable would be good.


Possibly split up the demo files and the actual scripts into separate subdirectories. For example:

scripts/xls_to_suJSON.py
scripts/SQL_to_suJSON.py
examples/VQEG_HDTV_Final_Report_Data.xls
examples/subjective_scores_for_suJSON.sql
…

We could move this repo, if you like, to the github.com/VQEG organization, once it is public. Do you plan to have a repo for the translators separately from the translator scripts? I guess one could simply move everything into one repo.


Overall I think the idea to standardize a format is really great, and that we should push to have more people use it. However, there really might be some "resistance" against it, if it's too complex to understand. People like to work with Excel or CSVs, so having an export script, even if it cannot cover all cases, would help.


We can think about adding suJSON output to https://github.com/Telecommunication-Telemedia-Assessment/avrateNG once it has a somewhat stable specification.

Qub3k commented 5 years ago

Thank you for the comments! I agree with all of them so they will be gradually incorporated into the code.

If it comes to moving this repo to github.com/VQEG, we were thinking actually about using https://github.com/vqeg-sam.


However, there really might be some "resistance" against it, if it's too complex to understand.

I agree. This is why we decided to make a step backwards and first publish a simplified version of suJSON (along with the tools). This simplified version would be a starting point for further additions. Many of them are already ready to be added, but we do not want to overwhelm anyone with the complexity of the format.


We can think about adding suJSON output to https://github.com/Telecommunication-Telemedia-Assessment/avrateNG once it has a somewhat stable specification.

This would be perfect! Chances are that this format will also be incorporated into Netflix's tools.

slhck commented 5 years ago

we were thinking actually about using https://github.com/vqeg-sam.

Sure, I forgot these existed.

About the data formats, when going back to CSV, the following concept may be interesting: http://vita.had.co.nz/papers/tidy-data.html — it's the basis for many analyses people do in R, and it's a good design philosophy for non-nested data.

Qub3k commented 5 years ago

About the data formats, when going back to CSV, the following concept may be interesting: http://vita.had.co.nz/papers/tidy-data.html

Thank you! I had no idea about this publication and it summarises all good practices in one place. Shame it cannot be directly applied to suJSON as it is quite nested.

LucjanJanowski commented 5 years ago

But if we write a suJSON to csv converter we have to implemnt it!

Lucjan

Lucjan JANOWSKI, PhD AGH University of Science and Technology Krakow, Poland phone: +48 12 617 48 06

On Sun, Mar 10, 2019 at 7:10 PM Jakub Nawała notifications@github.com wrote:

About the data formats, when going back to CSV, the following concept may be interesting: http://vita.had.co.nz/papers/tidy-data.html

Thank you! I had no idea about this publication and it summarises all good practices in one place. Shame it cannot be directly applied to suJSON as it is quite nested.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LucjanJanowski/translator-to-suJSON/issues/1#issuecomment-471329570, or mute the thread https://github.com/notifications/unsubscribe-auth/AI8p6A7ydsDjPNKVc8RXyapg6yQwUm94ks5vVUqmgaJpZM4bjeml .

slhck commented 5 years ago

I would be happy to help writing a conversion!

Of course, conversion would mean that some data has to be ignored. But I believe that having this kind of output is crucial to adoption of the suJSON format as an intermediary/storage format. Just thinking about how I would analyze subjective data, or how we typically deal with it in an ITU context, having it in a tidy/long CSV format is important. That is, we'd really need a flat and simple format that will work for 99% of analysis cases:

subject,pvs,src,hrc,rating
1,XXX,A,B,4

This way, it can be read back again to Python scripts using Pandas, or R scripts using the tidyverse package.

slhck commented 5 years ago

Maybe we have to think about the assumptions that are to be made when simplifying to CSV, e.g. should it include repetitions or multiple visits etc. And this will likely only work reliably for simple tests.

Qub3k commented 5 years ago

Repetitions should not be extremely difficult to add. A column with a timestamp would solve many order-related issues.

slhck commented 5 years ago

Clöosing this for now as this is more general discussion.