Open Conal-Tuohy opened 3 years ago
I think probably the best optimisation would be run the xinclude processor over the p5/includes
, p5/combo
and p5/metadata
directories before running over the p5
directory. At the moment, each file in p5
makes about 10 transclusions from various files in those subfolders, and typically each of those transclusions require several more transclusions (primarily from p5/includes
), and some of those require a few more again. Allowing the root documents to drive the transclusions recursively means a huge number of transclusions performed many times over. Performing those xincludes starting at the leaves and working up towards the root of the document will reduce the number of transclusions drastically. It should bring the total time down to a few minutes I would think.
We don't actually want to modify the files in p5/include
, p5/combo
and p5/metadata
in place, since the idea is eventually to make the p5
folder (with its subfolders) the source of truth. So some renaming of the existing folders is probably worth doing at this point. Maybe the import from acsproj
should go into a new source
folder with includes
, combo
and metadata
folders, that would later serve as source files for direct editing, when acsproj
is retired? The xinclude pipeline would copy that entire source
tree to p5
, and then use XInclude to modify the files in place, firstly in p5/includes
, then in p5/combo
and p5/metadata
directories (in either order), and finally in the p5
directory.
What do you think, @jawalsh ?
Sounds like a good plan. Please proceed!
John
On Jul 19, 2022, at 1:01 AM, Conal Tuohy @.***> wrote:
This message was sent from a non-IU address. Please exercise caution when clicking links or opening attachments from external sources.
I think probably the best optimisation would be run the xinclude processor over the p5/includes, p5/combo and p5/metadata directories before running over the p5 directory. At the moment, each file in p5 makes about 10 transclusions from various files in those subfolders, and typically each of those transclusions require several more transclusions (primarily from p5/includes), and some of those require a few more again. Allowing the root documents to drive the transclusions recursively means a huge number of transclusions performed many times over. Performing those xincludes starting at the leaves and working up towards the root of the document will reduce the number of transclusions drastically. It should bring the total time down to a few minutes I would think.
We don't actually want to modify the files in p5/include, p5/combo and p5/metadata in place, since the idea is eventually to make the p5 folder (with its subfolders) the source of truth. So some renaming of the existing folders is probably worth doing at this point. Maybe the import from acsproj should go into a new source folder with includes, combo and metadata folders, that would later serve as source files for direct editing, when acsproj is retired? The xinclude pipeline would copy that entire source tree to p5, and then use XInclude to modify the files in place, firstly in p5/includes, then in p5/combo and p5/metadata directories (in either order), and finally in the p5 directory.
What do you think, @jawalshhttps://github.com/jawalsh ?
— Reply to this email directly, view it on GitHubhttps://github.com/Conal-Tuohy/swinburne/issues/12#issuecomment-1188600780, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AAEXWOPJJU2RB3B622I3YXDVUYZC3ANCNFSM4XP7FGVQ. You are receiving this because you were assigned.Message ID: @.***>
Pre-transcluding the files in the subfolders includes
, combo
, and metadata
did cut the total runtime of the XInclude step by a fair bit (now down to 4:33 minutes on my development VM). That's a big improvement but still not exactly speedy. Shall I go ahead with moving the transclusion into a background thread, as well?
XInclude of the entire corpus takes > 15 minutes. Is it worth optimizing this? A custom XProc step that performs XInclude and also builds a cache of intermediate results could run fairly cheaply, and have a significant effect to the extent that the documents in the corpus XInclude a lot of copies of resources which themselves XInclude other resources (so that the recursive XIncludes would each be executed only once), and also by effectively memoizing any XPath selectors which the XInclude statements used.