Closed dirkroorda closed 3 years ago
This is clearly a good idea. I don't quite know when I'll have time to
implement - though it's pretty trivial - but I'm sure the url scheme will
look like https://nena.ames.cam.ac.uk/audio/30#5
for chunk 5 of text 30,
following standard url sectioning notation.
A quick point of clarification: we also allow transcribers to set "section numbers" throughout the text. These will always line up with chunks but section markers (1) and (2) might have several chunks between them. Put another way, a section marker is always on a chunk, but a chunk doesn't always have a section marker. This is to allow consistency with existing published works where the numbered pieces of text can be many lines long, and where we want to transcribe and translate them in finer detail.
On Fri, 16 Apr 2021 at 11:19, Dirk Roorda @.***> wrote:
Hi James,
can we use an url schema to point directly to individual lines in the texts? I would link to the nena corpus site straight from my search interface.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CambridgeSemiticsLab/nena/issues/67, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABB6WF5BMW2V7KROMHHF34TTJAFJXANCNFSM43BH6UZA .
OK, I need some clarification, maybe @codykingham can explain: the node type line
in the nena_tf repo, does it correspond 1-1 to chunks? And is the line with number n
the same as the chunk with number n
? If so, we better rename line
in nena_tf with chunk
.
If line and chunk are different concepts, I need a url that corresponds with a line. An option could then be
https://nena.ames.cam.ac.uk/audio/30?line=5
and a nice effect of this would be: you get the whole page for text 30
, but in there line 5
is marked as focused (blue background or whatever) and the line is scrolled into view.
Sorry @dirkroorda, I think in trying to avoid confusion I have added more!
I use the term "chunks" intending it to be meaningless in your existing linguistic contexts. When I say a "chunk" I'm referring to one row of the transcription interface; the smallest string of words the the website can label up (eg with a timestamp, speaker, translation, etc). I know the term "line" has a more formal meaning, at least in the context of TextFabric and while a "chunk" may be used as a "line" that's a matter of user convention not my rule.
Given the html standards already provide for section referencing with the hash url suffix - and that use case is exactly what we're proposing to do - I'd recommend we go for https://nena.ames.cam.ac.uk/audio/30#5
. This will have the effect you describe of showing the whole text with that chosen chunk highlighted in blue, see below image for example.
When I tried to disambiguate for "section numbers" that was to avoid a predicted confusion which could arise from the following example: Here, Geoffrey has entered a subsection from the middle of an published text. You can see the first chunk on the page is labelled with a grey circle "180". This is a reference to a separate system of section/line/chunk numbering in someone else's publication. We allow these numbers to be arbitrarily applied to some/any/all chunks without restriction but we don't enforce their use as commonly we'll want to split the text up in a more granular fashion than is published elsewhere. My suggestion is that the TF work ignores these for now, and perhaps considers adding them in as additional decoration at a later point.
Does that all make sense?
Thanks James, very much so.
I'll stick to the lines as I know them, and I put #
nn at the end of the URL.
But now I discover that I do not have the information to arrive at the text identifier.
I do have a feature textid
in the tf data, but it gives me values like this
A4 = A Tale of a Prince and a Princess
A7 = A Man Called Čuxo
A45 = The Fox and the Stork
(all these Barwar dialect) Do these identifiers mean something to you? I think we need to add the identifiers used on the website as well (or instead).
If that works out, it would be nice if each page could show a link to the search interface (one fixed link), because user can then rapidly go back and forth between searching and listening to the audio pages.
@dirkroorda I have opened a new issue #68 based on your comment to keep things simple.
Based on
I'll stick to the lines as I know them, and I put #nn at the end of the URL.
I know what I have to do on this ticket and will push a commit once I have it.
@dirkroorda this is now available for testing on staging instance, eg here: https://nena-staging.ames.cam.ac.uk/audio/30/#25
Of course nothing is as easy as it should be - the sticky audio controls mean that the browser's default approach to scrolling target hashes into view causes the highlighted chunk to sometimes be hidden behind the audio waveform! I have added an offset in javascript to account for this so it should not be an issue when going to the page directly. However, it may confuse your testing that you have to hard-refresh the page if you change the # in the url else the browser just scrolls to it without triggering a page reload (and thus without triggering the js shim).
I would have liked to make this also align the audio player and highlight the relevant section, but because the user has not yet interacted with the page Chrome stops the AudioContext from forming which is messing with my ability to set up that waveform highlight. If the user clicks on the highlighted row (and if it has an audio timestamp) then it will start playing.
I am already glad with users landing on the correct page and in the neighbourhood of the correct line. All the rest is food for finer adjustments if time permits. Thanks for providing an insight in the trickiness of achieving this.
One thing though: if the numbers in the grey circles are not the same as the line numbers, as you already explained above.
But if somebody comes in from a search, expecting to jump to line 25 of a certain text, and then ends up at number 52 (in the grey circle), then it is confusing. It would help if there was some indication that this is indeed line 25 (could be done by means of a title attribute).
But then I wonder: can't we do better? @codykingham We could adapt the TF data to the numbering we see on the website. Why not put in a feature cnumber
with the numbers we see on the website? And the same for the texts? Otherwise things become much too confusing!
There is hardly a limitiation in TF to work with multiple number/id features, the only penalty is to convert them in.
Hey @dirkroorda, in accordance with new UAT process could you please confirm that the changes made in this area to staging instance are non-breaking and fit to go to production? If so please apply the ready for production
label. Once all on staging
tickets in this batch are also marked ready for production
by their requesters I will deploy them all. This ticket can stay open for further discussion, spec and coding work on the same topic or you can close it off when no further dev work or deployment is needed. Let me know if this doesn't makes sense.
I leave that to @codykingham, James, because I do not have the right overview over the pipelines yet. I wait till I know how to reference texts and lines on the website from the information I have in text-fabric features. We are not there yet.
@dirkroorda I'm going to mark the current changset as production-ready so it doesn't block other staged work. This ticket can stay open while you consider the more complete integration you're aiming for.
This has now rolled out to production. Is there any work left to do on this ticket? If you're happy with the URL schema and highlighting on our end should the rest of the discussion of how to point from one system to the other be had at CambridgeSemiticsLab/nena_tf? Please close if so.
The URL schema is OK and the way it works too.
So we can close this issue, but there are two things left before I can link to lines in text:
Hi James,
can we use an url schema to point directly to individual lines in the texts? I would link to the nena corpus site straight from my search interface.