brailleapps / dotify.formatter.impl

Provides an implementation of the formatter interfaces in dotify.api
GNU Lesser General Public License v2.1
0 stars 6 forks source link

Various changes related to empty blocks and page breaks #98

Closed bertfrees closed 5 years ago

bertfrees commented 5 years ago
bertfrees commented 5 years ago

I have rebased this onto master. See the corresponding commits in https://github.com/brailleapps/dotify.formatter.impl/compare/master...sbsdev:super/sbs.

bertfrees commented 5 years ago

Need to rebase again.

bertfrees commented 5 years ago

Note that there is also the idea to go into another direction, namely to render empty blocks as an empty line. I propose however that because Dotify currently does not do this, and DAISY Pipeline depends on the current behavior of empty blocks (and their effect on marker-reference), we merge this PR, and later we can move forward with issue https://github.com/brailleapps/dotify.formatter.impl/issues/49 if we have figured out the spec.

kalaspuffar commented 5 years ago

Hi @bertfrees and @PaulRambags

In order to simplify eventual releases in the future, we ran all the regression tests in this repository. IMHO this is a really good improvement removing a lot of empty pages.

Sadly the regression tests showed 162 books with changes. In order to separate the changes from this PR from other PR's, we should review the books and update our baseline.

This might take some time but we are working on it.

Best regards Daniel

kalaspuffar commented 5 years ago

Hi @bertfrees and @PaulRambags

I'm currently working a bit on a pef-validator that should be able to detect breaks in page ranges or empty pages. At the moment I've just created a bit of a skeleton.

I don't know where the right place is to host this repository, so I have a copy on my user at the moment. So later, when we find where it should be hosted, we move it.

If you want to take a look, the repository is available at https://github.com/kalaspuffar/pef-validator

Best regards Daniel

bertfrees commented 5 years ago

@kalaspuffar OK, looks very useful. So I guess it's more of a "checker" that finds possible issues in a braille document rather than a "validator"? You could put your various different tools in this category in a single repo. You could put them in dotify.devtools if you consider them tools that aid with development, like checking books with regressions etc. If it's more targeted towards users then dotify.devtools is not the right place of course. Then it would need to be integrated into dotify or it would need a user interface of its own.

PaulRambags commented 5 years ago

@kalaspuffar Good idea! Currently it checks only page numbers (I had a quick look - please correct me if I'm wrong). I have another issue but I'm not sure in which version of Dotify it occurs, maybe it is already solved. The issue is: I wish that every volume of a duplex publication starts with an odd page number, and for some reason the third volume of a real-life book starts with an even page number. Would it be a good idea to add a check for this kind of functionality as well?

kalaspuffar commented 5 years ago

@kalaspuffar Good idea! Currently it checks only page numbers (I had a quick look - please correct me if I'm wrong). I have another issue but I'm not sure in which version of Dotify it occurs, maybe it is already solved. The issue is: I wish that every volume of a duplex publication starts with an odd page number, and for some reason the third volume of a real-life book starts with an even page number. Would it be a good idea to add a check for this kind of functionality as well?

Correct, the current version will only check page numbers and see if they are in order and detect empty pages. Still, have some work to do before it can do that, but I feel that I'm making progress. Continued work of the tool is, of course, welcome and I will accept pull requests as well if you want to contribute. I understand though that at this point, it can be a bit hard to contribute as I'm currently just figuring out how to get the first case to work.

Depending on the development of Dotify and if we can detect your issue in another way it could perhaps be solved here. But if we only want to detect this abnormality we could add the functionality to the tool.

Best regards Daniel

kalaspuffar commented 5 years ago

@kalaspuffar Good idea! Currently it checks only page numbers (I had a quick look - please correct me if I'm wrong). I have another issue but I'm not sure in which version of Dotify it occurs, maybe it is already solved. The issue is: I wish that every volume of a duplex publication starts with an odd page number, and for some reason the third volume of a real-life book starts with an even page number. Would it be a good idea to add a check for this kind of functionality as well?

Hi again @PaulRambags

I forgot to ask, do we have an issue reported on this? Just so I can add it to Q4 perhaps?

Best regards Daniel

bertfrees commented 5 years ago

I'm almost certain I had fixed this issue already (9ac4e09aac1322a4aadae130c7c69cc7332bf4a8) so I was surprised to hear this. So I asked which version of Dotify Paul is having this issue with.

PaulRambags commented 5 years ago

@bertfrees I'm not sure which version of Dotify I use, because the version that I think I use is a Dedicon fork and also I'm not entirely sure whether the version on our ACC server is really using this Dedicon fork because what I have on ACC consists of a variety of unofficial releases. @kalaspuffar I think it would be better to wait with adding an issue until I have a candidate or stable release of the Pipeline with (9ac4e09) in it, then I can test again whether the problem still occurs. I expect that it will be solved.

kalaspuffar commented 5 years ago

Hi @bertfrees and @PaulRambags

I've now created enough of a tool to check if there are any breaks in the page numbering sequence, and also check if there are any empty pages. I'll include the expected result (baseline) and the actual result (this branch).

actual.txt expected.txt

I have opened these files in a diffing tool, and I see that we have missing pages both before and after this update and we still have a few empty pages. But I believe there are a lot less blank pages and also fewer places where we have issues with the page number sequence.

If you want you can open these files in a diff tool and see if you can spot anything strange with the result. And perhaps if you have a before and after the output of your own, that you can run the tool against so we get even more data.

Our opinion is that this PR is good enough to merge; the result is an improvement on the previous result.

Best regards Daniel

PaulRambags commented 5 years ago

I have no before and after output. Please proceed and merge this PR.

bertfrees commented 5 years ago

What exactly does the "Missing page(s) between x and y" mean? Are we talking about pages (i.e. the problem that Paul mentioned) or page numbers? I guess missing page numbers are not always a problem, it depends on the requirements. For example a missing page number on an empty page may be acceptable in some cases. But I don't know the Swedish requirements so I can't really say anything about it.

I'm not sure if the "Empty page x" lines give us much information. This also depends on the requirements. For example with break-before="sheet" it is expected that you get empty pages. Not sure how you would detect these cases.

Anyway, I think once we have our regression test suite set up we can start doing some checks of our own. Until then it's a rather tedious process. But when I created the PR I could only see improvements (in the Dedicon books), so for me it's good to merge.