buda-base / public-digital-library

http://library.bdrc.io
5 stars 6 forks source link

Missing etexts #918

Open berger-n opened 1 month ago

berger-n commented 1 month ago

After searching for byang chub sems dpa'i spyod pa la 'jug pa with the best quality etext,

Image

the resulting page does not have any etext.

berger-n commented 1 month ago

indeed I had already spotted this, but it's not clear how to fix it... @eroux wdyt? can we provide a direct link to the corresponding text or volume?

(note that there seem to be an additional issue regarding this particular etext which won't open anyways: see seqNum/sliceStartChar/sliceEndChar https://library-dev.bdrc.io/show/bdr:IE23703

Image )


but it's the same for a working etext, see first result here: https://library-dev.bdrc.io/osearch/search?etext_quality[0]=1.99-2.01&q=byang%20chub%20sems%20dpa%27i%20spyod%20pa%20la%20%27jug%20pa%27i%20tshig%20%27grel%20%27jam

Image

eroux commented 1 month ago

ok, I'll fix the first issue with the wrong sliceEndChar, seqNum, etc.

For the more UX issue, I think the etext in an outline node (like https://library-dev.bdrc.io/show/bdr:MW1PD100944_7E64A5) could have a preview just like in the root instance (like https://library-dev.bdrc.io/show/bdr:MW1PD100944). It would make a lot of sense to me... does it seem feasible @berger-n ?

roopeux commented 1 month ago

A big +1 for the preview

berger-n commented 1 month ago

ok thanks! not trivial but feasible I guess

berger-n commented 1 month ago

done: link


note that in the results above we have the case of https://library-dev.bdrc.io/show/bdr:MW1KG11947 and its parts though: already cataloged but etexts are not imported yet, so for now the root etext is forever loading and its parts can't be located from https://library-dev.bdrc.io/show/bdr:MW1KG11947_466C5B for example

roopeux commented 1 month ago

The original issue is gone but this is probably related. This does not prevent testing, though. SCR-20240916-ifwl https://library-dev.bdrc.io/osearch/search?q=byang%20chub%20sems%20dpa%E2%80%99i%20spyod%20pa%20la%20%E2%80%99jug%20pa

eroux commented 1 month ago

oh thanks! This is an issue in the data, I'll have a look

eroux commented 1 month ago

fixed

roopeux commented 1 month ago

A similar case

  1. On https://library-dev.bdrc.io/show/bdr:MW4CZ5369?s=%2Fosearch%2Fsearch%3Fq%3Dbka%2527%2520%2527gyur. Note that the text has both scans and an etext.
  2. Search outline for pha rol tu phyin pa stong phrag brgya pa
  3. Try to see the first result
  4. BUG: Looks like the text has no scans and no etext. Shows error.
eroux commented 1 month ago

I'll have a look at this too, thanks for catching it!

eroux commented 1 month ago

well, nevermind, looking at the queries in the network tab the data is returned correctly... @berger-n do you know why there's no etext preview?

berger-n commented 1 month ago

interesting, thanks! let me have a look at this

berger-n commented 1 month ago

regarding the etext it's the bug in https://ldspdi-dev.bdrc.io/query/graph/etextrefs?R_RES=bdr%3AIE4CZ5369 (spotted Friday night, etextinVolume / seqNum / sliceEndChar / sliceStartChar are arrays)

regarding the scans I see this returning 500 in the first result of the search in the outline: https://iiifpres.bdrc.io/wvo:bdr:MW4CZ5369_0008::bdr:I1KG9140/manifest

{"status":500,"code":5001,"message":"filename for begin image null in image list not in bvm: I1KG91400004.jpg","link":null,"developerMessage":null}
eroux commented 1 month ago

Ah yes thanks, I firgot about the array thing, I'll look now

eroux commented 1 month ago

both things should be repaired

eroux commented 1 month ago

For this particular etext the images are off with the etext unfortunately, that's a data problem, I'll try to fix it soon

eroux commented 1 month ago

etext and images are back in sync!

berger-n commented 1 month ago

well done, thanks! on my side I've been able to fix the etext preview but it's only a quickfix for now because it can't sort the etexts so it arbitrarily takes the first one in the list: https://library-dev.bdrc.io/show/bdr:MW4CZ5369_0008

image

what about adding the volume numbers in the query? here: https://ldspdi-dev.bdrc.io/query/graph/ResInfo-SameAs?R_RES=bdr%3AMW4CZ5369_0008&format=json

image

eroux commented 1 month ago

I've added some tmp:eTextInVolumeNumber triples in https://ldspdi-dev.bdrc.io/query/graph/ResInfo-SameAs?R_RES=bdr%3AMW4CZ5369_0008&format=json

berger-n commented 1 month ago

it's perfect, thanks!

are the OpenPecha etexts supposed to work? just found those empty etext outlines: https://ldspdi-dev.bdrc.io/query/graph/etextrefs?R_RES=bdr%3AIE0OPI647D7CBC https://ldspdi-dev.bdrc.io/query/graph/etextrefs?R_RES=bdr%3AIE0OPI0673ACC7 https://ldspdi-dev.bdrc.io/query/graph/etextrefs?R_RES=bdr%3AIE0OPIA83EE6D2

see the results from @roopeux's query above but shortened because otherwise there's still this server error when keyword is too long I guess?

image

eroux commented 1 month ago

hmmm right, some of these things are broken but they're broken upstream so difficult to fix for us :( I'll detect them and exclude them from the import