BetaMasaheft / Documentation

Die Schriftkultur des christlichen Äthiopiens: Eine multimediale Forschungsumgebung
3 stars 3 forks source link

implementation of XInclude for large files to size limit and consequent syncing problem #1650

Closed thea-m closed 3 years ago

thea-m commented 3 years ago

The last changes carried out in ESqdq004 and ESam019 are not visible on the app. (For example, note the missing change elements from 2020-10-28 and 2020-12-21 in ESqdq004) The other files of https://github.com/BetaMasaheft/Manuscripts/pull/931 seem to be fine and the problem might be older than that, because the transkribus transcriptions added before to both files are also missing.

PietroLiuzzo commented 3 years ago

this is a known issue with files this big. I will upload them manually

PietroLiuzzo commented 3 years ago

https://github.com/BetaMasaheft/Documentation/issues/1472#issuecomment-698830482

thea-m commented 3 years ago

Thank you!

PietroLiuzzo commented 3 years ago

not quite closed... the manuscripts XML files have issues with <cb/> not allowing them to show up

thea-m commented 3 years ago

I'm really sorry, could you upload ESqdq004 manually again? Is there any way for me to do this or any other possibility for me to commit changes to these files without needing your intervention for the upload? In my current workflow, I foresee rather frequent commits, but of course, if you are needed for each of them, that's not tenable :(

PietroLiuzzo commented 3 years ago

the issue is the limit of github hooks, so, these huge files at the moment cannot follow the normal flow. I will look into a solution tomorrow, I will not manage today. I will test with xinclude so that the files are smaller.

PietroLiuzzo commented 3 years ago

I have uploaded ESqdq004

thea-m commented 3 years ago

Thank you!

PietroLiuzzo commented 3 years ago

Dear @thea-m this is just to reassure you that as I said I have been working on this issue today. I am not quite there yet because of an issue with the indexing which I hope the nice people in slack will help me with. For now it looks like we will be able to split files and use xi:include. you will have something like

ESqdq004.xml ESqdq004 (a directory) containing

each file in the directory will contain only the part removed from the main file. your main file will look something like this

<?xml-model href="https://raw.githubusercontent.com/BetaMasaheft/Schema/master/tei-betamesaheft.rng" 
schematypens="http://relaxng.org/ns/structure/1.0"?><?xml-model href="https://raw.githubusercontent.com/BetaMasaheft/Schema/master/tei-betamesaheft.rng" 
type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?><TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="ESqdq004" xml:lang="en" type="mss">
   <teiHeader>
      <fileDesc>
         <titleStmt>
            <title>Synaxarion (Maskaram-Naḥase)</title>
            <editor role="cataloguer" key="SH"/>
            <editor key="DN"/>
            <editor key="DR"/>
            <editor role="generalEditor" key="AB"/>
         </titleStmt>
         <publicationStmt>
            <authority>Hiob-Ludolf-Zentrum für Äthiopistik</authority>
            <pubPlace>Hamburg</pubPlace>
            <publisher>Die Schriftkultur des christlichen Äthiopiens und Eritreas: Eine multimediale
               Forschungsumgebung / Beta maṣāḥǝft</publisher>
            <availability>
               <licence target="http://creativecommons.org/licenses/by-sa/4.0/">
                  <p> This file is licensed under the Creative Commons Attribution-ShareAlike 4.0.
                  </p>
               </licence>
            </availability>
            <date>2016-06-07T17:45:36.405+02:00</date>
         </publicationStmt>
         <sourceDesc>
            <msDesc xml:id="ms">
               <msIdentifier>
                  <repository ref="INS0186QDQ"/>
                  <collection>Ethio-SPaRe</collection>
                  <idno facs="QDQ/004/QDQ-004" n="185">QDQ-004</idno>
               </msIdentifier>
                <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" 
                    href="ESqdq004/msContents.xml">
                    <xi:fallback>
                        <!--msContents-->
                    </xi:fallback>
                </xi:include>
                <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" 
                    href="ESqdq004/physDesc.xml">
                    <xi:fallback>
                        <!--physDesc-->
                    </xi:fallback>
                </xi:include>
                <history>
                  <origin>
                     <origPlace>
                        <placeName ref="INS0186QDQ"/>
                     </origPlace>
                     <origDate notBefore="1530" notAfter="1580">Mid- or late 16th cent.</origDate>
                  </origin>
               </history>
               <additional>
                  <adminInfo>
                     <recordHist>
                        <source>
                           <listBibl type="catalogue">
                              <bibl>
                                 <ptr target="bm:EthioSpare"/>
                              </bibl>
                           </listBibl>
                        </source>
                     </recordHist>
                     <custodialHist>
                        <custEvent type="restorations" subtype="none"/>
                     </custodialHist>
                  </adminInfo>

               </additional>
            </msDesc>
         </sourceDesc>
      </fileDesc>
      <encodingDesc>
         <projectDesc>
            <p>Encoded according to TEI P5 Guidelines.</p>
         </projectDesc>
         <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="https://raw.githubusercontent.com/BetaMasaheft/Documentation/master/prefixDef.xml">
            <xi:fallback>
               <p>Definitions of prefixes used.</p>
            </xi:fallback>
         </xi:include>
      </encodingDesc>
      <profileDesc>
         <textClass>
            <keywords scheme="#ethioauthlist">
               <term key="Hagiography"/>
               <term key="ChristianLiterature"/>
               <term key="Translation"/>
               <term key="Liturgy"/>
            </keywords>
         </textClass>
         <langUsage><language ident="en">English</language><language ident="gez">Gǝʿǝz</language></langUsage>
      </profileDesc>
      <revisionDesc>
         <change who="DN" when="2011-11-25">Ethio-SPaRe team photographed the manuscript</change>
         <change who="SH" when="2012-08-01">catalogued</change>
         <change who="MV" when="2015-02-04">last edited</change>
         <change who="PL" when="2016-05-10">transformed from mycore to TEI P5</change>
         <change who="PL" when="2019-04-25">added missing extras from domlib</change>
         <change who="DR" when="2019-05-07">Adjusted XML</change>
         <change who="DR" when="2020-03-02">Completed transcription of Maggābit</change>
         <change who="HM" when="2020-10-28">The transcription is carried out using Manuscripts of Ethiopia and Eritrea 5 model of Transkribus</change> 
        <change who="DR" when="2020-12-21">Corrected transcription of 3-4 Gǝnbot, copied to first edition div</change>
      </revisionDesc>
   </teiHeader>
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ESqdq004/facsimile.xml">
        <xi:fallback>
            <!--facsimile-->
        </xi:fallback>
    </xi:include>
   <text>
      <body>
        <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ESqdq004/edition.xml">
            <xi:fallback>
                <!--edition-->
            </xi:fallback>
        </xi:include>
        <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ESqdq004/transkribus.xml">
            <xi:fallback>
                <!--trankribus-->
            </xi:fallback>
        </xi:include>
         </body>
   </text>
</TEI>

this will make the files much smaller and potentially much much smaller, since we can include in the included.

This, far from being something which can be expanded to every file, requires that you

I think you can already make yourself an idea about whether this would work for you and potentially anyone with very large files.

This works well in oxygen, is handled in exist-db and I do not see why it should be a problem for GitHub, but I still have to try that out and handle this cases. sorry, it is taking much more than I expected and I do not know if I will manage before next year at this point.

thea-m commented 3 years ago

I'm sorry that this turned out so complicated. It all seems doable from my point of view, let me know whenever you are ready for me to try it out (no need to rush now, since it takes time anyway). Anything that allows me to push changes on these files through without your intermediary will work for me. I hope that it is ok that as long as you are still working on this, I will keep editing the texts in these files and commiting the changes to my current branch?

PietroLiuzzo commented 3 years ago

It is fine, they are unlikely to go through to the DB unfortunately.

thea-m notifications@github.com schrieb am Do., 24. Dez. 2020, 11:02:

I'm sorry that this turned out so complicated. It all seems doable from my point of view, let me know whenever you are ready for me to try it out (no need to rush now, since it takes time anyway). Anything that allows me to push changes on these files through without your intermediary will work for me. I hope that it is ok that as long as you are still working on this, I will keep editing the texts in these files and commiting the changes to my current branch?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/BetaMasaheft/Documentation/issues/1650#issuecomment-750831955, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD4A3FPXXWRPWPSSZ5XAVDSWMGUXANCNFSM4VEJDCMQ .

PietroLiuzzo commented 3 years ago

I am testing the new implementation with https://github.com/BetaMasaheft/Manuscripts/pull/957 the webhook complains for timeout, since the amount of work required is much more but the files do go through.

PietroLiuzzo commented 3 years ago

done and tested https://betamasaheft.eu/Guidelines/?q=include&start=11&id=xiinclude

eu-genia commented 2 years ago

@thea-m I found this issue, I guess this is what you were referring to: but as I assumed, indeed, splitting files helps get them uploaded, but unfortunately all this has nothing to do with the visualization, which is applied to the file where the split parts have been remerged back together.

thea-m commented 2 years ago

Ah, thank you, good to know