akretion / factur-x

Python lib for Factur-X, the e-invoicing standard for France and Germany
Other
156 stars 56 forks source link

Cannot process ZUGFeRD 2.0.1 documents, because the autodetection thinks they use version 1.0 #41

Open j-moser-sap opened 2 months ago

j-moser-sap commented 2 months ago

Steps to reproduce

  1. Download the ZUGFeRD 2.0.1 documentation and examples: http://www.awv-net.de/updates/zugferd20/zugferd201.zip
  2. Extract the file ZUGFeRD201/Beispiele/EXTENDED/zugferd_2p0_EXTENDED_Warenrechnung.pdf
  3. Execute the following code and watch it fail:
from facturx import get_facturx_xml_from_pdf

with open("zugferd_2p0_EXTENDED_Warenrechnung.pdf", "rb") as f:
    get_facturx_xml_from_pdf(f)

This will produce the following error:

2024-07-11 13:32:42,803 [INFO] A valid XML file zugferd-invoice.xml has been found in the PDF file
2024-07-11 13:32:42,805 [ERROR] The XML file is invalid against the XML Schema Definition
2024-07-11 13:32:42,805 [ERROR] XSD Error: Element '{urn:un:unece:uncefact:data:standard:CrossIndustryInvoice:100}CrossIndustryInvoice': No matching global declaration available for the validation root., line 2
2024-07-11 13:32:42,806 [ERROR] No valid XML file found in the PDF: The Zugferd XML file is not valid against the official XML Schema Definition. Here is the error, which may give you an idea on the cause of the problem: Element '{urn:un:unece:uncefact:data:standard:CrossIndustryInvoice:100}CrossIndustryInvoice': No matching global declaration available for the validation root., line 2.

"Mitigation"

Downgrade to factur-x in version 1.12. This is the last version that processes this file completely fine.

Cause

The cause for this bug is the following snippet in get_xml_from_pdf:

flavor = 'autodetect'
if filename == ORDERX_FILENAME:
    flavor = 'order-x'
elif filename == FACTURX_FILENAME:
    flavor = 'factur-x'
elif filename in ZUGFERD_FILENAMES:
    flavor = 'zugferd'
xml_check_xsd(xml_root, flavor=flavor)

The file name of the XML embedded in ZUGFeRD 2.0.1 files is zugferd-invoice.xml. This leads the script to detect the flavor "zugferd" and pass this on to xml_check_xsd, which then automatically loads an incompatible XSD for ZUGFeRD 1.0; causing the validation to fail.

Funny enough, when just leaving the flavor at "autodetect", xml_check_xsd will do its own detection based on the XML inside using get_flavor. This would detect "factur-x" as flavor for the ZUGFeRD 2.0.1 file, which it can then correctly parse.

Proposed fix

The fastest fix I could imagine is to remove the snippet above from the get_xml_from_pdf method. As xml_check_xsd does its own check anyway, this should be fine?

alexis-via commented 1 month ago

I'm very surprised that zugferd 2.0.1 specifies "zugferd-invoice.xml" as attachment filename. It is really written that way in the ZUGFeRD 2.0.1 specifications ?

j-moser-sap commented 1 month ago

@alexis-via Yes, it is.

Download the ZIP here: https://www.ferd-net.de/standards/zugferd-versionsarchiv/zugferd-2.0.1.html

The file ZUGFeRD201/Dokumentation/ZUGFeRD-2.0.1-Spezifikation.pdf on page 23 contains the sentence:

Die XML-Datei wird stets mit dem Namen "zugferd-invoice.xml" eingebettet.

(translation: "The XML file is always embedded with the name "zugferd-invoice.xml")

I am also a bit surprised by that, but apparently, this is how 2.0.1 is specified.

j-moser-sap commented 3 weeks ago

Hi @alexis-via,

Is there anything I can do to help get this merged? I can prepare demos/examples/etc. if needed, just let me know. Thanks!

alexis-via commented 3 weeks ago

I'm ok for the change ; we need to be more flexible. I'll take care of that in a few days/weeks and make a new release

j-moser-sap commented 3 weeks ago

Thank you, that is great!