BetaMasaheft / Dillmann

Dillmann Lexicon
0 stars 0 forks source link

Wrong PdF file associated with the entry ሠፈጠ #425

Open MagdaKrzyz opened 2 years ago

MagdaKrzyz commented 2 years ago

The following entry https://betamasaheft.eu/Dillmann/lemma/Le297611adc574a0b90febcc58a2c7ce3 is associated with the wrong pdf file of the Dillmann's Lexicon. Instead of p. 1424 it should be p. 1394.

eu-genia commented 2 years ago

does this mean also that the lemma is wrongly ordered in respect of previous-next? the mapping as far as I can see comes from the entry number, which is at the moment 12059, the previous one https://betamasaheft.eu/Dillmann/lemma/Ldb01abf690894b8396c8d4d740449473 has number 12058, the next one https://betamasaheft.eu/Dillmann/lemma/Lb1de468b1cc74eb7a5317b33db2ed3cc has number 12060. are they all on wrong pages? if only ሠፈጠ is in the wrong place then what is the previous-next constellation? (I am not yet sure how to fix it but I will look into it once I understand what has gone wrong)

eu-genia commented 2 years ago

For the moment I fixed the PDF link for ሠፈጠ as requested For the future I believe you can fix such errors yourself in the edit mode: image

MagdaKrzyz commented 2 years ago

All lemma from p. 1393 till p. 1408 of Dillmann's Lexicon are wrongly linked to page 1424. Lemma from p. 1409 of DL are wrongly linked to p. 1393. Is there any way to correct them in bulk or do I have to correct them individually?

eu-genia commented 2 years ago

If you list here all problematic lemmata I can probably go through the XML files more quickly on the back end but I would need

Lemma ID -----> correct page

MagdaKrzyz commented 2 years ago

I wanted first to try to work with the edit mode but it shows me an error. Could you take a look at the screenshot if I am encoding it properly? Screenshot 2022-08-08 140946

eu-genia commented 2 years ago

I think it should be exactly as shown in the guidelines, I had used the red square for you to see what you need, have you tried? {'|{Dil.1393}|'}

MagdaKrzyz commented 2 years ago

It turned out that it works ( with the encoding |{DiL.1393}| ) but still I get the same error message. https://betamasaheft.eu/Dillmann/lemma/Lf78a56e4ecb949cc971f0c02beab9900 I will correct the lemmas since I have to open each of them anyway.

eu-genia commented 2 years ago

(I think you would not get the error message if you use the correct encoding)

MagdaKrzyz commented 2 years ago

Again there is a mistake. It displays the previous page - the last page of the lexicon proper, not the right one from the addendum. https://betamasaheft.eu/Dillmann/lemma/Lf78a56e4ecb949cc971f0c02beab9900 Perhaps it will be easier if I DOCH send you the links?

MagdaKrzyz commented 2 years ago

Ok let me try

eu-genia commented 2 years ago

I will correct Lf78a56e4ecb949cc971f0c02beab9900 for you, please use correct encoding for the other items.

MagdaKrzyz commented 2 years ago

Ok, please do because I tried all options and it does not work for me properly.

eu-genia commented 2 years ago

I have rechecked, the upconversion wants DiL.1393 (so capital L)

eu-genia commented 2 years ago

I have updated the guidelines view (or I can change the upconversion.xsl, but probably easier with the guidelines)

eu-genia commented 2 years ago

Again there is a mistake. It displays the previous page - the last page of the lexicon proper, not the right one from the addendum. https://betamasaheft.eu/Dillmann/lemma/Lf78a56e4ecb949cc971f0c02beab9900 Perhaps it will be easier if I DOCH send you the links?

which is exactly the pdf that should be linked? is it correct now?

MagdaKrzyz commented 2 years ago

I have tried as you suggested exactly how the guidlines say but there are still mistakes 1) the curly brackets and the single quotation marks are displayed, 2) the linked page is the previous one 1391-1392. See screenshot 1. I got rid of the curly brackets and the single quotation mark: they are not any more displayed but still the linked page is wrong. See screenshot 2. Here is the link to the entry: https://betamasaheft.eu/Dillmann/lemma/L14f1e452a48d4c748f1237f84a0d0671 You corrected the entry under https://betamasaheft.eu/Dillmann/lemma/Lf78a56e4ecb949cc971f0c02beab9900 in the XML file. Have you tried using the encoding in the guidlines? Perhaps it will be easier if I DOCH send you the links? Screenshot_1 Screenshot_2

eu-genia commented 2 years ago

I guess the code recalculates the cb value introduced by the encoding it in a way that confuses pages; I don't have the time now to look into it, and I am afraid to break other connections if I change the way the page numbers are calculated fo link to hacohen... i can correct all but if I should do it the easiest for me are not links but a list as suggested above

eu-genia commented 2 years ago

I have now tried changed the calculation so that 1393 gives 1393 (it had -2) but I am not sure that this now works ok in linking pages in other parts of the lexicon, not the addendum but the main part?

app.xqm 998-999

href="{concat('http://www.tau.ac.il/~hacohen/Lexicon/pp', format-number(if(xs:integer($column) mod 2 = 1) then 
if($term//tei:cb) then (xs:integer($column)  -2) else $column else (xs:integer($column)  -1), '#'), '.html')}">

changed -2 to -0 for now

eu-genia commented 2 years ago

I also changed the guidelines prompt on the right from {'|{DiL.1234}|'} to |{DiL.1234}| as indeed this is what the regex in upconversion.xls expects

MagdaKrzyz commented 2 years ago

"I have now tried changed the calculation so that 1393 gives 1393 (it had -2) but I am not sure that this now works ok in linking pages in other parts of the lexicon, not the addendum but the main part? " --- I tried but the linking does not work in the main part of the lexicon.

It works now in the addendum: the entry is associated with the right page but the number of the page displayed is wrong. Take a look here: https://betamasaheft.eu/Dillmann/lemma/L14f1e452a48d4c748f1237f84a0d0671 https://betamasaheft.eu/Dillmann/lemma/L8c2d6822b2214ed9b18f3e58a04ae205

I will make the list of entries with the page.

eu-genia commented 2 years ago

the number displayed is correct, as the pdf shows two columns.

and - i guess i was not clear enough. i know that the page linking now works for the addendum. but there must have been some reason why pietro did the calculation as it was, so probably it now does not work somewhere else, please check that the entries in the main part, that used to be linked correctly, are still linked correctly and not to the next page.

MagdaKrzyz commented 2 years ago

no, it is not correct --- it should show pages 1393/1394 and not 1392/1393

eu-genia commented 2 years ago

I see... this is all quite a bit time-consuming and complicated. Please check that the linking is correct in the main part. If it is I will see if I can solve this too. Otherwise I have to undo my changes first.

MagdaKrzyz commented 2 years ago

it is correct

eu-genia commented 2 years ago

I can simply leave the same number for display, without the two and slash, is it OK?

Then I can change (app.xqm line 1000)

{if($term//tei:cb) then (string(number(format-number($column, '#')) - 1) || '/' || format-number($column, '#')) 
else (' ' || format-number($column, '#'))}

to

{if($term//tei:cb) then (string(number(format-number($column, '#'))) ) 
else (' ' || format-number($column, '#'))}

does this look better this way?

MagdaKrzyz commented 2 years ago

Yes, now it is correct. By the way, two pages divided by a slash are dispalyed only if an entry begins on one page and ends on the next one.

eu-genia commented 2 years ago

there is no slash any more at the moment, for none.

BUT I now understand the calculations and the part of the problem and the encoding convention, finally!

In the past, the element cb(column break) was only present in entries that go over two columns. So if an entry starts in column 1393 and then continues in column 1394, then at some point column break is marked up (cb="1394") so the script automatically used to assume that the entry starts in column 1393, link to pdf of page 1393, and display 1393-1394.

Now we started introducing column breaks in order to fix the erroneous linking, but with the consequence that we now use the cb element with a new meaning.

Maybe before we go on I will once again try to see if there is another way to fix the linking to the PDF. (I am not sure, I am afraid that the alternative was changing the number of the record which would break the previous-next relation, and cb was an easy solution, but indeed it is unclean and misleading, as cb is reserved for entries going over 2 columns...)

Are you planning to work in the office at some point? Maybe we could sit together so that we can be sure that we get the results we want.

MagdaKrzyz commented 2 years ago

there is no slash any more at the moment, for none. --- and that is correct. But what will happen if we have to introduce one?

I can come to the office any time. When would you like to deal with the issue?

eu-genia commented 2 years ago

can you for the moment list here all records where you introduced the cb (unless they are listed above already)? so that in case i can follow up on that?

MagdaKrzyz commented 2 years ago

I have not introduced any cb, only single columns. Should I introduce some in places where they are needed?

eu-genia commented 2 years ago

we again have misunderstanding....

as the guidelines explain (see the picture above) typing |{DiL.1234}| after conversion results in xml element cb(column break), from there the app.xqm script takes the value to create the link to the PDF.

so all entries you have now been editing by inserting |{DiL.1234}| you have been inserting column breaks. This is what I was trying to explain above.

no slash and 1:1 calculation now SEAM correct for the new cases where you introduce column break as a work around to link to the correct pdf.

BUT in the past column break was used when there was a column break. check e.g. https://betamasaheft.eu/Dillmann/lemma/L4c6dbaa59db34e86941fd2c74c750c7a

it has column break 30, and now displays 30, but actually the entry is in column 29, column break comes at the end.

So as I had thought by introducing the changes now I have broken the linking for other older files using cbcorrectly.

eu-genia commented 2 years ago

(the example above used to display 29/30, which was correct.)

MagdaKrzyz commented 2 years ago

I understand what you mean. But the guidlines say nothing to me because there is written in the explanation that I won't use it anyway. The codes are Chinese to me. My question was simple, actually, and it got complicated. What could we do now? How to correct the linking of pages without introducing column breaks where it is unnecessary?

eu-genia commented 2 years ago

I don't know yet...

eu-genia commented 2 years ago

Give me a list of entries and pages and I will try to find a different way, let us keep column break for column breaks

MagdaKrzyz commented 2 years ago

I will give you but it will take some time.

eu-genia commented 2 years ago

I think I have to change the entry numbers but again I am afraid this may lead to some new problems...

MagdaKrzyz commented 2 years ago

The first two columns of lemmas with their numbers and the page. Is it this that Lemmas_wrong_column.docx you need?

eu-genia commented 2 years ago

it seems that for some reason TraCES records got numbers placing them "inside" the Dillmann sequence, breaking the whole thing, as currently the lemma preceding Lf78a56e4ecb949cc971f0c02beab9900 (n="12000") is Lfff6715eb89649c7aa3ce4dc2588cb52 (n="11999"), before that we have L613d6e14e909416fad5c9dd0e1565cf0 (n="11998"), both from TraCES, and before that the whole is broken again, as the pointer brings nowhere image

It seems that there is no record with the number 11997 at all.

These TraCES insertions must have broken something, I don't know why or how to fix it yet, but I have to understand where the error starts and where it ends before I can try debugging.

eu-genia commented 2 years ago

The error is far more widespread.

E.g. all entries from L10325caf1d4241eb920de40c983c95b0 (n="10157") through to L270c98dd574b47a2a6922b70c53ce8ad (n="10214") are linked to 1392 but should be 1409. L270c98dd574b47a2a6922b70c53ce8ad is followed by L5761203336f24618a83c3daa7f8e4e58, correctly, which has a cb element, linking it correctly to 1409.

The entry preceding the sequence (xml:id="L3c604b13ab90482bb3e241cd970ea1fe" n="10156") is correctly linked to 1392.

It is probably related to the various indices being transformed/uploaded/included separately at a different stage, making the sequence of entries in the online version not equivalent to the sequence of lemmata in the printed version, yet the PDF links are calculated on the basis of the sequence.

I see no possibility of renumbering all records, as numbers must also be unique or everything will break.

eu-genia commented 2 years ago

entries from xml:id="Laef88677eae345a99105b3bcfe52691d" n="11740" to xml:id="L3b73725403c94db4855f2d3f91a0977d" n="11808" point to 1424, should be 1407-1408

etc etc etc

only the last lemma in each column is linked correctly thanks to the cb element at the end

MagdaKrzyz commented 2 years ago

in Dillmann, which would be the lemma preceding Lf78a56e4ecb949cc971f0c02beab9900? - it should be ፓፒራ papira, here is the link: https://betamasaheft.eu/Dillmann/lemma/L3c604b13ab90482bb3e241cd970ea1fe

"Papira" is the last entry of the lexicon proper and then the addendum begins. The first part of addendum in the paper version is a section with obscure words--- these are those whose numbers I have sent you. But in online DL papira is followed by a section with nomina propia (two secions ahead). All entries from the first two sections of addendum, 1393-1408, are linked with one and the same page 1424.

Is it linked correctly? --- no, it is linked to the wrong page 1424 and not to 1393.

what do you expect should be the order as previous-next? I don't understand the question.

MagdaKrzyz commented 2 years ago

"It seems that there is no record with the number 11997 at all." --- a question, could it be a result of deleting an entry? Some entries were deleted.

eu-genia commented 2 years ago

previous-next is shown in each entry (e.g. i have posted a picture above when there is no previous record at all, as there is no record with number 11997, it probably was deleted at some point, but now the arrows left-right do not work (the left in this case).

from papira the arrow to the right brings you to https://betamasaheft.eu/Dillmann/lemma/L10325caf1d4241eb920de40c983c95b0 which is again wrong. image the whole sequences are broken resulting in the wrong allocation of pages but also wrong previous-next relation.

since there is no way to renumber the entries and i cannot think of any easy way to restore the sequences i can only think of introducing a new element to link to the correct page so that we do not abuse column break.

for example, we can have (tried out in Lf78a56e4ecb949cc971f0c02beab9900) <ref type="hacohen" n="1394"/> in encoding. I have added the corresponding condition to app.xql (e.g. let $column := if($term//tei:cb) then string(($term//tei:cb/@n)[1]) else if($term//tei:ref[@type='hacohen']) then string(($term//tei:ref[@type='hacohen']/@n)[1]) else string(max($col//tei:cb[xs:integer(ancestor::tei:entry/@n) <= xs:integer($n)][@xml:id]/@n))) to take the value from there. I will try to find a way to insert the ref through the web interface. Still fixing all in this way seems hardly possible, there are hundreds and hundreds of pages wrongly linked...

eu-genia commented 2 years ago

"It seems that there is no record with the number 11997 at all." --- a question, could it be a result of deleting an entry? Some entries were deleted.

It could be, even if I could not find it among the deleted either. This shows - it is better to never delete entries. since you keep creating new ones one could use the old one and replace the content. I have not yet looked into the issue of how the numbers are assigned, but for sure it is not good if numbers are missing, since the whole application is built on the idea that there is a full incremental sequence 1+1+1+1+1+1+1+1+1+1+1 etc. possibly it were the deletions that broke the whole linking.

eu-genia commented 2 years ago

in Dillmann, which would be the lemma preceding Lf78a56e4ecb949cc971f0c02beab9900? - it should be ፓፒራ papira, here is the link: https://betamasaheft.eu/Dillmann/lemma/L3c604b13ab90482bb3e241cd970ea1fe

"Papira" is the last entry of the lexicon proper and then the addendum begins. The first part of addendum in the paper version is a section with obscure words--- these are those whose numbers I have sent you. But in online DL papira is followed by a section with nomina propia (two secions ahead). All entries from the first two sections of addendum, 1393-1408, are linked with one and the same page 1424.

Is it linked correctly? --- no, it is linked to the wrong page 1424 and not to 1393.

you did not understand the question. papira is linked correctly to 1392.

eu-genia commented 2 years ago

one can now fix the page linking through inserting |{pdf.1394}| this will create the refelement

MagdaKrzyz commented 2 years ago

I understand that I can correct the pages by inserting |{pdf.1394}|. Right?

eu-genia commented 2 years ago

correct just that there are a whole lot (hundreds) of wrongly linked records because of the split sequences in the numbering

MagdaKrzyz commented 2 years ago

Ok, I will fix as much as I can.