Rethink how we manage task statement translations

lw commented 11 years ago

We used to support translations of the task statements by storing, for each user, a list of languages he/she "understands". Then we would highlight (i.e. show on large buttons) the translations into these languages (if they existed). This approach had (mainly) two issues:

We were using the PostgreSQL ARRAY type, making CMS PostgreSQL-specific. We don't want this.
The language choice was global among all tasks (i.e. it was impossible to choose en_US in one task and en_UK in another).

At IOI we didn't want to have the second issue (to give more freedom to team leaders in the selection of translations) so we decided to change approach and to store, for each user, the IDs of the Statement objects that will be highlighted, as a list of integers (again with an ARRAY type).

(Note that both solutions didn't provide a way to select the translations in AWS)

For the sake of generality I think the latter approach has to be preferred. Yet, storing the Statement IDs is a very ugly solution and I think we should use a Many-to-Many relationship between User and Statement objects instead.

What do you think?

Also, again for generality, we may want to allow more than one "official" statement...

giomasce commented 11 years ago

Hi.

Il 04/10/2012 20:13, Luca Wehrstedt ha scritto:

We used to support translations of the task statements by storing, for each user, a list of languages he/she "understands". Then we would highlight (i.e. show on large buttons) the translations into these languages (if they existed). This approach had (mainly) two issues:

We were using the PostgreSQL ARRAY type, making CMS PostgreSQL-specific. We don't want this.

Definitely. Ideally, SQLAlchemy should make CMS usable with any SQL dialect.

The language choice was global among all tasks (i.e. it was impossible to choose en_US in one task and en_UK in another).

At IOI we didn't want to have the second issue (to give more freedom to team leaders in the selection of translations) so we decided to change approach and to store, for each user, the IDs of the Statement objects that will be highlighted, as a list of integers (again with an ARRAY type).

(Note that both solutions didn't provide a way to select the translations in AWS)

For the sake of generality I think the latter approach has to be preferred. Yet, storing the Statement IDs is a very ugly solution and I think we should use a Many-to-Many relationship between User and Statement objects instead.

What do you think?

On one side, proliferating tables to represent many-to-many relations isn't a good thing. On the other one, it may be better than any other solution, since it is the only way I see to keep taking advantage of PostgreSQL integrity checks (e.g., ensuring the consistency of foreign keys). Personally, I prefer having a database with many tables instead of giving up consistency checks (but remember that having many tables also add overhead and code complexity in cms.db, like in terms of the code needed to serialize and de-serialize data).

BTW, configuring a many-to-many relation with SQLAlchemy is possible, although not completely trivial. Here is the reference:

http://docs.sqlalchemy.org/en/rel_0_7/orm/relationships.html#many-to-many

Also, again for generality, we may want to allow more than one "official" statement...

I don't see any real use case for this. Can someone provide one?

Giovanni.

Giovanni Mascellani mascellani@poisson.phc.unipi.it Pisa, Italy

Web: http://poisson.phc.unipi.it/~mascellani Jabber: g.mascellani@jabber.org / giovanni@elabor.homelinux.org

stefano-maggiolo commented 11 years ago

I agree with Gio. Can we add a facility for the serialization of many-to-many relationship to mimic what we would do if we had the PostgreSQL arrays?

I don't see any real use case for this. Can someone provide one?

I think that APIO has many official statements, in a certain sense. But I guess that having many official statements is the same as having none, as in that case there are no unofficial statements and each contestant want to access only the one in his/her language.

giomasce commented 11 years ago

Il 05/10/2012 09:28, Stefano Maggiolo ha scritto:

I don't see any real use case for this. Can someone provide one?

I think that APIO has many official statements, in a certain sense. But I guess that having many official statements is the same as having none, as in that case there are no unofficial statements and each contestant want to access only the one in his/her language.

Yes, having different official statements appears to me only meaningful when you have two different levels of "officiality" of statements, each one with strictly more than one statement. I'd find this requirement rather funny. Thus, it seems to me that having support for more than one official statement isn't really required.

On the other hand, probably this discussion shows that "official" isn't really the right word, since in some cases the translations aren't less official others. I'd prefer to call what we now call "official" as "primary" or some more neutral word. Actually, that was my initial proposal, although I didn't really insist for it.

Giovanni.

Giovanni Mascellani mascellani@poisson.phc.unipi.it Pisa, Italy

Web: http://poisson.phc.unipi.it/~mascellani Jabber: g.mascellani@jabber.org / giovanni@elabor.homelinux.org

lw commented 11 years ago

I agree with Gio. Can we add a facility for the serialization of many-to-many relationship to mimic what we would do if we had the PostgreSQL arrays?

Don't know about general method to export many-to-many relationships, but this one can be easily exported as a list of (task name, language code) pairs for each user. Yet, I'm not sure if this is equally easy to reimport...

About multiple official statements: I'm not aware of a real use-case... perhaps we could find one in multilingual countries (Switzerland, Canada, South Africa, etc.). In any case, this change is trivial: add an "official" boolean field to the Statement objects and remove the "official_language" string field from the Task object. No need of many-to-many relationships. I think this is a cleaner solution anyway and, actually, in this scenario it's more complicated to enforce a single official statement than to allow many.

I'd prefer to call what we now call "official" as "primary" or some more neutral word.

I proposed the "official" name because with it I meant "the statement that is taken as reference in case of appeals, etc.", while all other statements are just there for the convenience of the contestants and have no "legal" validity. It' a very IOI-style idea. Yet, "primary" doesn't seem a bad idea either... we can change it, if you want.

lw commented 11 years ago

I tried to fix this issue in 6f08b6f7707c077519c1692e3c8183000b120572.

I chose to go in the opposite direction than the one discussed here: I stored the preferred task statement translations (now called "primary") as a JSON-encoded dict of lists of language codes. The reason for this is that it seemed to me a bit of an overkill to use a many-to-many relationship (with the additional table and the issues with exportation and importation) for some data that is used only by CWS and that, after the rewrite, will just be sent to the client as JSON. Also, this approach resembles the one we would like to introduce in TWS: being able to select a translation before it's actually available.

Let me know what you think of it...

lw commented 11 years ago

Fixed in adea29ae6ea9cf0413c940b82844642cc2094ab3.

cms-dev / cms

Rethink how we manage task statement translations #30

Giovanni.

Giovanni.