brailleapps / dotify.formatter.impl

Provides an implementation of the formatter interfaces in dotify.api
GNU Lesser General Public License v2.1
0 stars 6 forks source link

Various changes needed in order to handle Dedicon test books #97

Closed bertfrees closed 5 years ago

bertfrees commented 5 years ago

Commit bb28b93 was your attempt to fix the oscillation issue (#70). I believe commit b13a6dd fixes a bug in your commit. Commit 80185f8 is to improve the speed at which the volume breaking algorithm converges to an optimum. Commit b8dc003 was part of my initial attempt to fix the oscillation issue, but it's still relevant I think. Commits b662b0d, b13a6dd, fe7804e and 051d099 were done by me and Paul when I was at Dedicon some weeks ago. After this the oscillation issue was fixed. Commits e34b5dd and aa8bfe9 fix two other issues that I discovered during my visit to Dedicon. The first issue was that the braille page numbering would break at volume transitions in some cases. The number would become even on right hand pages where it was uneven in the previous volume, or visa versa. The second issue was that we would sometimes get a StackOverFlowError during the resolution of a marker-reference.

joeha480 commented 5 years ago

Commit bb28b93 and a modified version of b13a6dd (using Optional) have been merged.

Commit 80185f8 could be implemented by modifying the cost function instead of modifying the calculation provided by the EvenSizeVolumeSplitterCalculator.

I've rewritten aa8bfe9 to avoid using Object and reflection.

joeha480 commented 5 years ago

Processed b662b0d

joeha480 commented 5 years ago

Processed e34b5dd

bertfrees commented 5 years ago

I'm adding three more commits. I already know you are probably not going to accept two of them, but this way you can at least help me find a solution.

kalaspuffar commented 5 years ago

@bertfrees will look at this pull request and resolve conflicts and see if there is any code that could be used still and either fix this PR so it's ready for pulling or creating a new PR with good code.

bertfrees commented 5 years ago

The following commits were already in master:

The following commits have been rebased. If needed they can be split up into multiple PRs.

bertfrees commented 5 years ago

Commit 80185f8 could be implemented by modifying the cost function instead of modifying the calculation provided by the EvenSizeVolumeSplitterCalculator.

If I remember correctly what Joel meant by this is that the volume sizes calculated by EvenSizeVolumeSplitter were never meant to be maximum values, only target values.

However the cost function already allows for higher values. The problem is that the algorithm to find the optimal break points is greedy, i.e. the individual break points are calculated one by one, and there is currently no way for the cost function to look back or ahead in order to predict that the allocated volumes and suggested target sheet counts aren't going to suffice. The cost function already detects when we're in the last volume, and tries to fill it completely with the remaining sheets, but apparently that is not enough (when we have a lot of volumes).

The other problem is that EvenSizeVolumeSplitter does not provide a good prediction of the required sheets because the calculation is based on the total number of required sheets in the previous iteration, but by adding extra volumes the overhead will increase which adds to the total number of sheets. By trying to predict the added overhead for the extra volumes we can improve things.

Something that is harder to predict, is how much the "breakable" constraints push the actual average volume size down (or up). At the moment EvenSizeVolumeSplitter doesn't take into account this effect at all. If the actual average volume size is lower than the target size, this will currently result in remaining sheets which will result in a decrease in the target size, but this may push the actual size down even more.

All in all, I think my initial solution of relaxing the target size slightly towards the limit wasn't such a bad idea because this way you leave some leeway for things that are hard to predict.

bertfrees commented 5 years ago

By trying to predict the added overhead for the extra volumes we can improve things.

I have done this now (locally).

bertfrees commented 5 years ago

I'm going to test some Dedicon books with the new commit (that I haven't pushed yet) and with/without commit db1e949, but I think that's gonna be easier to do when the other PRs have been merged. So focusing on them first.

bertfrees commented 5 years ago

I rebased the branch onto master and left out one commit (ee43e16) for which I created a new PR.

bertfrees commented 5 years ago

I have tested all the Dedicon books with only these two commits (and in combination with https://github.com/brailleapps/dotify.formatter.impl/pull/111) and I get no oscillations:

"Base 'breakable' info in CrossReferenceHandler on BlockLineLocation" (8c184d4) appears to be not needed (anymore), so I'll leave that one out, at least for now, because it broke the keep-with-previous-sheets feature.

The "Don't remember exactly why this was needed" commit (6110c1f) was probably never needed to fix the oscillation issue, but it is possible that I made the change to fix an issue that I noticed while testing the Dedicon books, or maybe it was just an issue I spotted in the code. Unfortunately I don't remember. I suggest that I move this commit to a separate PR and then we can check how/if this changes the regression test suite, and judge whether it is an improvement or not.

bertfrees commented 5 years ago

A side effect of c27f558 is still that in some cases the volumes are not as "equal" as before, although it's less the case than with the the original change a8534a9, which consistently allocated too much spaces for volumes.

The volume-breaks-advanced3-input.obfl test shows that in this particular case the extra overhead for the added volume was overestimated. I don't know if there is a way to avoid this. As I explained before the way the volume breaking works is just very limiting.

bertfrees commented 5 years ago

@kalaspuffar I think this branch is ready now for running the regression tests on.

kalaspuffar commented 5 years ago

Hi @bertfrees

Shouldn't I wait to run the regression test until all the unit tests passes?

Best regards Daniel

bertfrees commented 5 years ago

But I explained didn't I? The behavior changed. I can change the expected result of that unit test if you want. The unit test is of course not a very realistic case because the volumes are so tiny. I'm more interested to know what happens with real books.

kalaspuffar commented 5 years ago

Hi @bertfrees

I think you should change the test, we won't merge any code that has failing tests. And if you think the test is incomplete or incorrect you may remove it as well. But we need a solid test suite that can find the issues before we find them in production.

Best regards Daniel

bertfrees commented 5 years ago

I didn't say anything about merging. Sure, I'll modify the test, no problem. I was just unsure about the result, and I'm trying to get some feedback from you. Firstly, as I said it's difficult to say whether the results of these small tests are fine. We should probably create some tests with more and larger volumes. But I figured we could maybe first look at the books from the regression test suite to get an impression. Secondly, you and I might have different goals. Dedicon's primary goal is to fix the oscillation issue that prevents some books to get through at all. I think that the volume breaking algorithm itself has room for improvements as well, although right now that is of secondary importance to me. On the other side there is MTM who might be more concerned about the continuity in the behavior of the volume breaking and might attach more importance to the "evenness" of volumes. I don't know. So the fact that I am happy with this change doesn't necessarily mean you should be happy with it.

kalaspuffar commented 5 years ago

Hi @bertfrees

Great comment about what your goal for this change is. I've now run the regression tests and this change did also change 162 books so I need to look into these further to get a picture of the changes.

Do you have any preference on which result I look at first?

When it comes to expectations and goals, our goal is to progress the work with the product in the most stable manner.

I've not had any discussions with MTM of the specific topic but I don't see any reason why the evenness of volume breaks should be a deal-breaker if we need to improve the code in order to produce some kinds of books.

The main goal for MTM at the moment is to solve the issues Dedicon has with the product so you can produce your workload.

Best regards Daniel

bertfrees commented 5 years ago

Do you have any preference on which result I look at first?

I haven't.

I've been thinking about what is required to validate a test result, when the volume breaks may have moved. There are two parts to it:

kalaspuffar commented 5 years ago

Hi @PaulRambags and @bertfrees

We have run our tool to verify pages and see what changes that generated. And what I can see there is actually only two documents changed with this PR. The rest is due to us not changing the baseline (Baseline must be a published version).

Looking at these there are minor changes moving some text from one volume to another. I've reviewed and approved the code so when @PaulRambags have reviewed this PR we can merge it.

Best regards Daniel

bertfrees commented 5 years ago

OK cool.

Although not so urgent anymore, the ideas from my previous comment are still relevant I think. I'll add it as a topic for the next meeting.