UUDigitalHumanitieslab / readit-interface

Public interface for READ-IT
Other
1 stars 0 forks source link

String cutting in SnippetView #157

Open alexhebing opened 4 years ago

alexhebing commented 4 years ago

As explained in #133, the SnippetView goes through strings almost entirely to cut off a piece that fits in the available width, and does this twice (once for prefix and once for suffix). Can this be done more efficiently?

BeritJanssen commented 4 years ago

Wouldn't you get this behaviour for free if you use e.g. tiles in Bulma? Maybe I'm overly optimistic here... or maybe I don't know enough about the exact type of interface element and strings you want to implement it for.

alexhebing commented 4 years ago

The use case here is thus:

Source:

We have a very long text from a source in which a user has made a selection, which is subsequently annotated with a category and perhaps linked to a linked data item from our triple store.

The snippet view, then, needs to show the selection, say a selection plus a prefix and suffix, so (for example) 10 characters before and 10 characters after the selection:

(...) has made a selection ,which is (...)

Or something like that. The inefficiency is in finding the 10 (or whatever) character long strings, i.e. prefix and suffix, from the source text, which could, obviously, be very, very long.

I do not believe tiles will help us much here, but if you still do, please let me know!

BeritJanssen commented 4 years ago

Hmmm... Will the data be stored in Elasticsearch by any chance? Then you could take advantage of the highlighting mechanism.

On Thu, 14 Nov 2019 at 07:46, Alex Hebing notifications@github.com wrote:

The use case here is thus:

Source:

We have a very long text from a source in which a user has made a selection, which is subsequently annotated with a category and perhaps linked to a linked data item from our triple store.

The snippet view, then, needs to show the selection, say a selection plus a prefix and suffix, so (for example) 10 characters before and 10 characters after the selection:

has made a selection ,which is

Or something like that. The inefficiency is in finding the 10 (or whatever) character long string from the source text, which could, obviously, be very, very long.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/UUDigitalHumanitieslab/readit-interface/issues/157?email_source=notifications&email_token=ACVIBOD3MVOOZ3U2N2ZM7YLQTTX3RA5CNFSM4JMUX6GKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEAY5JA#issuecomment-553750180, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACVIBOG35QSRRP5J4PARV73QTTX3RANCNFSM4JMUX6GA .

-- Berit Janssen Ph.D. Computational Musicologist

BeritJanssen commented 4 years ago

Highlighting in Elasticsearch works nicely. However, it's not capable of returning a specific snippet. Say in the text she was annotated as Reader, Elasticsearch would return snippets with she, irrespective of whether this is the actual annotation.

This could be resolved by setting the text field to term_vector with_positions_offsets. It seems as if in that case, we can work with offsets and specify where to start the highlighting search. Since we have a working implementation to find snippets now, going the Elasticsearch route may create development overhead that outweighs the performance overhead.

jgonggrijp commented 4 years ago

For something completely different, maybe it is possible to implement this efficiently with CSS, using a combination of alignment, overflow: hidden, ::before and ::after.