Closed IanMayo closed 9 months ago
I've investigated this, and there's actually a different cause.
We currently take the 'pages' (normally BottomLayer divs, but sometimes other divs) that we find, and sort them by their top value, and put them in the output page in ascending order of the top value.
However, in this case, the top values we've got are:
[('BottomLayer2', 111), ('BottomLayer', 125), ('BottomLayer3', 5353)]
They should be in the order BL, BL2 and BL3. However, BottomLayer2 is a child of GrayLayer3, and so the top value it has is relative to other things in the GrayLayer, not to the page as a whole. In this example, BottomLayer is a direct child of the body element, as is BottomLayer3 - so we've got two top measurements on one scale (the whole page) and one on a different scale (within a GrayLayer), so the sorting isn't giving us the result we want.
Do you have any idea how often this situation occurs? That is, a BottomLayer within some other element, not directly within the body element? I checked a few other files at random and didn't find it, but that doesn't mean it's not present within the real data. The comment in the source file for this example says:
<!-- this is an example of a BottomLayer appearing inside a GrayLayer, as observed in file A13 -->
We could probably fix this by finding the parent of the BottomLayer that is a direct descendent of the body element, and getting the top value of that - but I'm wondering if that might cause some other problems with separate layers for images etc. Obviously for BottomLayers that are direct children of the body it will behave as it does currently.
What do you think?
I think it's sound logic for the top
value we store in the dictionary to be the arithmetic sum of the top
values of the element we find, and all parent divs
that have a top
value - because that is effectively how far down the rendered content that the element appears.
When we have a BottomLayer
inside a GrayLayer
, I'm pretty sure all of the parent divs have a top
, but it would be good if the logic allowed for an immediate parent div without a top
, but where the ultimate parent does have one.
Ah yes, great idea to sum the top values (why didn't I think of that!). I'll get on that later.
Ian to check onsite if this is still an issue, by looking at file a13
This is still an issue. I have fixed it in file A13 by moving the block out of the parent block and incrementing the top
value by the parent top
. It has parsed and published correctly (thought I had to run with no-skip-first-run
).
We have two choices:
Option 2 is probably easiest, I don't think its as simple as looking at the order of items in link_tracker.json
, since I think they are in order of being encountered, not vertical sequence.
Unit_Banjo
in Britain_Complx
remains a valid instance of this pattern.
This should be sortable automatically, by using the sum of the top values up the tree. We now have code to do that (for the defloating stuff), so integrating it shouldn't be particularly difficult. I'll do it ASAP, but it may not be until the end of the week.
Thanks - that sounds fine. I'll make a note to come back to the issue.
I'm looking at
britain_complx\unit_banjo.html
.In the html file the first
BottomLayer
is thesignatures-table
. But inunit_banjo.dita
, theRemarks
section is first.I think this is because we are processing the named elements from the
shopping-list
before the## first page layer
:I guess this is because
number4
is linked to before any other targets on the page.When we're collating the shopping lists, can we put
'### first page layer
at the start? For the unit documents, the first element in the document is always the first one the document viewers see.