berteh / ScribusGenerator

Create beautiful documents with data. Open source pdf (and Scribus) template and mail-merge alternative.
http://berteh.github.io/ScribusGenerator/
MIT License
247 stars 37 forks source link

fill_count #189

Closed MartinZaske closed 2 years ago

MartinZaske commented 2 years ago

Hi team,

we are regularly using this tool, but today nothing worked. So I learnt that for Scribus 1.5.7 I need an all new version of ScribusGenerator. I downloaded and it works just fine. Good job! So much has happened since I did my own fork in about 2015.

Now missing two of my own features, I look into the code:

In ScribusGeneratorBackend.py I find a remnant of an idea I had submitted many years ago:

My version is todays: v2.9.1 (2021-01-22)

In lines 674 and 677 there is a so called "fill_count" which forces the output file-names to show a set amount of digits.

So instead of 1 2 3 12 200

the files are named (for example, with fill_count set to 3): 001 002 003 012 200

You get it.

Now I cannot find the place where this fill_count is defined, or where the user can enter it.

Lines 408 and 409 seems to be the only place where create_output_file() is called and it does only submit four values. But the beast is ready to work with five values.

For tonight I just edited a hack and manually turned line 677 into this: result = str(index).zfill(3)

(I have some more scripts for the pagination and for sending many pdf files to our network-printer, so I really need those file names excactly as before. For just now it worked.)

Now, I am not a great programmer, I have even forgotten what to call the beast (a function? a module?) and I do not want to propose a pull-request which is maybe outdated by a few days if it takes me a while to figure things out. My question is this:

Should I try to work on the code or is it more effective for the core-team to create a user-input-box for this fill_count? I most often need the file names three-digits-wide. Rarely four.

Or do you hate the idea and want to wipe it from the code? (Please leave it in.)

Or should we just add some in-file documentation so that some users who need this feature can hack it like I did tonight?

Please let me know how best to procede. If possible I do not want to work with my own "fork" again, I would prefer to stay with the flock.

And again my appreciation for your work on this important complement for Scribus. If you ever want to bring it into Scribus itself, I would muchly vote in favour.

Martin

berteh commented 2 years ago

Hi Martin. There was an issue in the python3 version that would not substitute variables names in the output file names. It will be fixed in the next release #187 very soon. That should work for your case I think, so there's no need for you to jump in the code. I'll let you know. B.

MartinZaske commented 2 years ago

Hi berteh,

I believe my question is not related to the issue with the "user variables" but rather with the way the function create_output_file() is being called in the ScribusGeneratorBackend.py in line 409 and how it is defined in line 674.

It is the "internal variable" named "fill_count" that is not defined anywhere, unless I am totally mistaken.

For the moment, I would be happy if the fill_count could be set "statically" to a value of "3", maybe in the class CONST: but I do not know enough of how to get a constant to apply when a function is being called.

Maybe as a default?

I would have added a hard-coded default to line 674 like this: def create_output_file(self, index, filename, data, fill_count=3):

You probably have a better way to get something like this set up: def create_output_file(self, index, filename, data, fill_count=DEFAULT_FILL_COUNT):

Please have a quick look at the lines that I mentioned (still same version v2.9.1 (2021-01-22)).

Thank you.

Martin

berteh commented 2 years ago

this is now implemented (only in python3 branch). please test and feedback. the setting is to be configured in https://github.com/berteh/ScribusGenerator/blob/python3/ScribusGeneratorBackend.py#L68

MartinZaske commented 2 years ago

Dear berteh,

thank you very muchly. This is an elegant solution, which is giving exactly what we need for my follow-up scripts and which does not bother other users, who prefer default.

You put a brief and helpful comment inside the code. I should write up a short paragraph for the online instructions, so that this golden nugget is not overlooked by other or new users. Will do when I find time.

I did two tests, from my Scribus 1.5.7 on Windows 10 pro 64bit.

First run worked as expected; very nice.

Second run stalled. The issue was probably that I am still used to run several batches into the same folder; so the script somehow stalled totally. Needed the task manager to get out of it. I did more tests but cannot reproduce the problem.

So thank you for this fix, it is much appreciated, as is the entire tool ScribusGenerator.

Martin

(I also found an easy and elegant solution for my other offset-needs: I reminded myself that my commercial file manager lets me mass-rename each batch in seconds with regexes that I can define/save/re-use. So this fill_count fix allows me to stay with the main branch of ScribusGenerrator from now on. Ready for production in the Python3 world!)

berteh commented 2 years ago

Hi Martin. Glad it works (again) for you.

It seems to me like you do some 'heavy lifting' with this script. Got me curious: How many files do you generate in one typical run? In how many different batches? To store, share, print, email?

I'd gladly get a sample pdf to get an idea, if not confidential.

... and for a free batch renamer in Windows there's http://www.antp.be/software/renamer/features as well.

But I usually use pdftk to handle the last step assembling of pdfs in final documents (eg adding static cover, last credits page, opt. security password). Works both in script and with a GUI (pdftk4all for Windows) so I don't need intermediate renaming: https://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/

MartinZaske commented 2 years ago

When all is well, one wants to move on to other issues, but I owe you this answer, as we appreciate Scribus and the Generator a lot:

We are running a non-profit project and we are doing all our publishing in Scribus: for commercial printing (small print run in digital) on paper and on T-shirts, for laser-printing in our own office and even bitmap-exports for WhatsApp mailings etc.

I cannot give you sample data nor files, as our mail-merge is mainly for letters to our friends and sponsors content and address-data all fall under data protection. We do those regularly and they are important. The several batches that I had mentioned are caused by our newsletters being in several languages and versions and as I have found in the past that ScribusGenerator can stall on runs over 100 addresses I manually split large CSV files. (I should test on my new computer, maybe no longer needed.)

So a typical run is on a very full 8-page Scribus documents with plenty illustrations. We only insert greetings and reference names in three fields.*

Numbers are between 17 and 200+ data sets each run which will take several minutes to complete. We always opt for separate PDF files so far. Sometimes in very dry seasons or in the worst rainy season we can get paper jams for recto/verso printing, so we can keep certain files and can quickly re-print any that failed.

*Note: I have another hack that I forgot to mention, from our 8-page Scribus document, we only create ScribusGenerator output for pages one (the greeting) and page 8 (or page 4), i.e. the last page. Those are the only pages where we insert merge-data. This was a minor hack which might or might not interest other users.

So we have newsletters of 8 A5 pages that we arrange on two double-sided A4 sheets of paper. All the not customized pages I can print super fast because I only send each page once to our printer only.

The follow-up scripts that I had mentioned make sure that the output-pages 1 and 8 end up correctly on one sheet of A4 paper (pagination) and another script helps me to send all those PDF files to our Kyocera network-printer. There is a Kyocera tool, but it is "dumb", can only do default and not accept any custom printer-settings, so my script is nothing much, but better than looking at print-menus and hitting ENTER hundreds of times.

So in summary, your new version has covered two of my personal hacks (numbering in fixed digits like 001, 002, 010, 100) and I have to manually add prefixes if I run several batches into one folder (using my file-explorer magic) and I had to make a mild hack to line 804 in the Backend.py:

pages_count = [1,scribus.pageCount()]

This automatically outputs only the firs and last page, where we (luckily) have our fields to mail-merge.

All for now, thank you muchly for your work on this tool; it adds real value to Scribus.

berteh commented 2 years ago

Thanks Martin for taking the time to write this all down !

I never had problems running SG to generate huge amount (1000+) of files in SLA format (from command line) really fast, but the PDF generation is indeed quite time and resource consuming in Scribus... mostly due to its lack of a real "headless mode" and need for user interaction to close some dialog boxes.

I'll have a look if there's a quick way to improve that, but the best person to talk to about improving command line (i.e. "headless" SLA > PDF) conversion is @aoloe . Ale is active in Scribus dev, and has provided a nice command line script to batch convert SLA to PDFs at https://github.com/aoloe/scribus-script-repository/tree/master/sla-to-pdf.

FYI you may be interested in the PDFTK server (opensource PDF stitcher) that has great features to combine pages together / split, reorganise, compress and more... all from command line or with a GUI: https://www.pdflabs.com/tools/pdftk-server/

berteh commented 2 years ago

I forgot to add: the time consumption of SLA > PDF generation is the main reason why I developped the merge mode: I found out it is much faster to generate one single HUGE pdf with all my dynamic data and then use the command line to burst it into individual pages and recombine it with other PDFs that hold the static parts of the documents.

... just if it helps ;)

berteh commented 2 years ago

Just to make my last workflow proposal more practical: I think I'd try to

  1. have a single PDF file with all the static content of your newsletter (called central.pdf below)
  2. generate a single SLA with all first and last pages using SG and its variables to pull the needed data from CSV or JSON, convert this SLA to a single PDF (called covers.pdf below) using either Ale's script or Scribus itself (since it's a single file, but may be quite big)
  3. extract the front (odd) and last (even) pages from the covers file with pdftk-server
    pdftk covers.pdf cat odd output firsts.pdf
    pdftk covers.pdf cat even output lasts.pdf
  4. combine the first, centrals and last pages for each recipient (if you have 235 of them)
    for i in {1..235}; do pdftk F=firsts.pdf L=lasts.pdf C=central.pdf cat F"$i" C L"$i" output newsApril2022/full_"$i".pdf; done

... but it's just a quick guess and maybe you wouldn't save much time ;) I'm glad ScribusGenerator is useful to you anyway... and if you find a way to have a faster SLA > PDF generation please let me know ;)