Ordering of numbered texts in Audio section

GeoffreyKhan commented 3 years ago

In the list of uploaded audio files in https://nena.ames.cam.ac.uk/audio/, in A10 is ordered before A9 in the Urmi, Christian dialect, thus:

A10 A9 A8 etc

see

Can you make A10, A11 etc come after A9.?

jamespstrachan commented 3 years ago

Thanks for clear report and link!

This is a trickiness about sort order of numerical things within longer alphanumeric strings. Technically these titles are presented in dictionary order (A1 comes before A4, even if it's actually "A10 (Kha..."). There are hacky things we can do to make it sort more logically, but nothing comprehensive as each would rely on a narrow understanding of how this example is structured so might bork on future, different title formats.

Perhaps the best solution is to take this metadata out of the title itself. You already have most of this set on the text itself (albeit not yet visible until #65 is approved). Could we remove this from the title of the text and instead include it in this table as separate columns, eg:

name	author	text id	dialect	transcription	translation
Is there a man with no worries?	Khan 2016	A4	Urmi, Christian	✔	✔
Women do things best	Khan 2016	A5	Urmi, Christian	✔	✔
... etc ...
A Visit from Harun ar-Rashid	Khan 2016	A10	Urmi, Christian	✔	✔

This would make it much easier to sort the page by dialect, then logically by text id then alphabetically by title. It would also make it possible to allow users to filter and sort the table for themselves in future when it gets bigger.

Let me know if this is an acceptable solution, and if you think recording_date would be a useful extra metadata field to have,

GeoffreyKhan commented 3 years ago

This looks fine, but please note:

I'd suggest replacing the column title Author with Source. Also this column may be empty for some texts, since they have not been published yet.
The text id column would need to be ordered in the correct sequence. In some dialects, as here, the id would be letter+numeral. In some dialects only a numeral.

GeoffreyKhan commented 3 years ago

Please could you put this on staging

jamespstrachan commented 3 years ago

I haven't yet written the code for this part. Once a changeset is pushed I will apply the on staging label and then you can test it. I'm not going to move any more work into staging until the existing lot is all marked ready for production else we'll never actually get anything deployed!

jamespstrachan commented 3 years ago

The sort ordering on the Text ID field is not trivial. I may have to write a specific sorter for this field and before I do want to check I have the spec down:

A > A1 > A2 > A10 > A999 > AA > B > [not set]

Are there any cases where the text ID is not of the form [someletters][somenumbers] or just [someletters]?

GeoffreyKhan commented 3 years ago

Most texts now simply have a title in words, such as 'Bread and Cheese'. The titles that begin A1, A2 etc, are the numbering that appears when these texts are published. There will be B1, B2 as well. The ones without the A1, A2 etc are, in principle, unpublished. If it would help, we could add the same kind of numeration before the title of all texts. What do you prefer?

jamespstrachan commented 3 years ago

(Revisiting this as part of current milestone)

I think that the "A2" or "B1" code should not be part of the title string, and instead saved in the new text_id field we previously created for this purpose. If we want to present them joined up we can output one field then the other: "{text_id} {title}".

Similarly, I think that strings like "(Khan 2016)" should not appear in the title as there's a source field in which this reference is more commonly specified.

Finally, and just in case there's an easy win here, would it be acceptable to use text_ids in the form "A01", "A02", ..., "B23"? This is commonly how systems avoid the ambiguous sort-order issues of inconsistent length numbers. (if more than 99 "A" texts are likely to eventually exist we should pad to three digits, eg "A001")

GeoffreyKhan commented 3 years ago

Very good proposals. I've implemented these changes in the Urmi, Christian texts and they look fine and are ordering correctly.

jamespstrachan commented 3 years ago

Great, that looks better.

As an extra tweak, how about we lose the source column from the table and have the source somehow available as a hover element when available? Because the source text varies in length from nothing to the huge Khan, Geoffrey. The Neo-Aramaic Dialect of the Assyrian Christians of Urmi. 4 vols. Studies in Semitic Languages and Linguistics 86. Leiden-Boston: Brill, 2016, vol. 4 it kind of distorts the table. For variable-length/optional fields like this, particularly if you don't expect users to be scanning the page to find them, hiding it in the hover text of an ⓘ symbol beside the text title might be neater.

GeoffreyKhan commented 3 years ago

OK, let's try that tweak.

CambridgeSemiticsLab / nena

Ordering of numbered texts in Audio section #72