berteh / ScribusGenerator

Create beautiful documents with data. Open source pdf (and Scribus) template and mail-merge alternative.
http://berteh.github.io/ScribusGenerator/
MIT License
243 stars 37 forks source link

generator crashes - can't figure out why #182

Closed garydale closed 2 years ago

garydale commented 3 years ago

I'm running a command line script but I get the same thing happening if I run it from within Scribus. The output is:

`$ ~/GenOfficers.sh Committees
14:41:07 - ScribusGenerator - INFO: ScribusGenerator initialized
14:41:07 - ScribusGenerator - INFO: Generating all files for Committees.sla in directory /home/garydale/mnt/archives/2021/Lions Club/District/Directory/
14:41:07 - ScribusGenerator - INFO: parsing scribus SLA file /home/garydale/.scribus/templates/Lions Club/Committees.sla
14:41:07 - ScribusGenerator - INFO: source document consumes 1 data record(s) from 104.
14:41:07 - ScribusGenerator - INFO: variables from data files: ['Position', 'Lion', 'E-Mail', 'Phone', 'Home-Club']
14:41:08 - ScribusGenerator - ERROR: 
error: Traceback (most recent call last):
  File "/home/garydale/.scribus/scripts/ScribusGenerator-python3/ScribusGeneratorCLI.py", line 171, in <module>
    generator.run()
  File "/home/garydale/.scribus/scripts/ScribusGenerator-python3/ScribusGeneratorBackend.py", line 227, in run
    tmpElt = ET.fromstring(outContent).find('DOCUMENT')
  File "/usr/lib/python3.9/xml/etree/ElementTree.py", line 1347, in XML
    parser.feed(text)
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 1, column 34947
`

I've used the script to successfully generate a different file, and, as per above, it happens without the script too. I've loaded the .csv file into Calc and re-saved it, so I'm sure that it's valid CSV. I even replaced some square brackets with () in case that was an issue - it wasn't.

I've triple checked my variables and they seem to be OK. The only thing I'm doing different in this template is I'm using %SG_NEXT-RECORD%, but I still get an error if I remove it.

I'm hoping someone can tell me what the problem might be.

Here's a sample of the .csv file:

"Position","Lion","E-Mail","Phone","Home-Club" "District Governor [DG]","Person Dg","person1@gmail.com","999 999-9999","Toronto Centennial" "Immediate Past District Governor [IPDG]","Person Ipdg","person2@gmail.com","888 888-8888","Thistletown"

The template lists the information with multiple records per page but also adds PDF annotations - a dummy to later be used to link to a biography page, and mailto: and tel: web links.

After further digging - by removing various elements and retrying, I've narrowed it down to a text element shown in the attachment. I can't see why it should cause a problem but maybe you can.... ScreenShot

I eventually just deleted the text boxes and recreated them, It worked. I can't see any difference between the original and the recreated one. However, when I went back and duplicated the entire section (text box and overlaying PDF annotations) in bulk to fill an entire page the problem returned.

I created the initial page as a test with only 5 text boxes on it (and 4 %SG_NEXT-RECORD% boxes) plus the PDF annotations. When I removed the 5 text boxes then created a new one that I copied & pasted 4 times, the generator worked. When I then took the bottom 4 boxes + NEXT-RECORDs, etc., and copied and pasted them as a block, the problem returned.

It appears that there is something in the bulk copy & paste to duplicate sections that the generator doesn't handle.

In my next test, I removed just the (new - I left the 5 that were working [see two paragraphs above]) text boxes then selected just the 5 text boxes, copied and pasted them back. That worked. Unfortunately, the SG_NEXT-RECORD boxes weren't being acted upon.

In an attempt to fix this, I removed the boxes and added the SG_NEXT-RECORD to the end of each text box. Unfortunately that got me back to the error when I copied the text box. Removing the copied items, I tried again - with just one text box on the page. It worked, but the SG_NEXT-RECORD was executed before the pdf annotations.... so those links contained information from the next record (because there is an SG_NEXT-RECORD, the first page contained the information from the first record while the annotations had information from the second, while the second page showed the information from the third record...).

All subsequent efforts to resolve this, including starting with a fresh document, have simply led to generator crashing. I tried rebooting but that didn't solve things. I tracked it down to the presence of %VAR_Position% anywhere in the text box. It doesn't seem to be the fact that it's the first column - inserting a dummy column didn't help - or the name of the column. Nor can I see anything in the contents to cause a problem.

I also don't get why it's happening now and not earlier. I haven't touched the generator install nor the CSV file contents except to change the column headings (as above). I've started with a fresh .sla file to use as the template...

After multiple more tries, I finally got the thing to work. I can't see any significant difference in the template that worked versus the various ones that didn't. I started with the VAR_Position omitted and got a few records per page with the SG_NEXT-RECORD at the start of the second and subsequent text boxes. After that worked, I was able to add VAR_Position back in without it complaining.

I was also able to bulk copy & paste without generator throwing error messages. The only thing that went wrong was on the first generated page, each text box showed up twice with the same data in the same position but different levels. This didn't happen on the second and subsequent pages.

S1SYPHOS commented 2 years ago

@garydale Seems to be a problem with the text you want to insert rather than the variable names and/or position etc - I had a similar issue once, and it sounds a lot like what I encountered back then .. for me, it was a "middot" (or interpunct, see wikipedia) character causing trouble.

// See #138

berteh commented 2 years ago

Hello Gary.

Sorry for not answering this issue earlier, I just was not very active in GitHub and may have missed the notification.

The initial problem may have been with a non printable character in the template (quite likely from the error log you report), or indeed with the multiple copies mechanism, that might not be failproof.

Could you please share a version of the template that breaks? I cannot seem to reproduce the issue.

Thanks, and sorry again for the delay.

berteh commented 2 years ago

I'll freeze this issue since cannot reproduce. Hoping it was indeed just an issue with a non-printable character that you solved by removing the faulty paragraph and creating it anew.

Please re-open attaching the problematic template if the issue still arises with the updated ScribusGenerator script !