Closed tappoz closed 2 years ago
This looks like a bug in a38. Could you send me a test invoice that validates with AdE and breaks a38? Then I can add it to the test suite and see about a fix
Yeah the issue is hiding sensitive data but retaining a realistic XML structure. I can try over the weekend to provide a sanitised version of the XML by commenting here :crossed_fingers:
Apologies for the late reply @spanezz @valholl I am still battling with the sanitised XML example. However I think I found some workarounds that could be useful:
If I take the VAT details of a company containing UTF-8 chars in their name e.g. from here on the VIES website https://ec.europa.eu/taxation_customs/vies/vatResponse.html Ørsted A/S with VAT number DK 36213728
Then I can use this function to flush UTF-8 XML to a file:
def flush_xml_to_file(fattura_a38: a38.fattura.FatturaPrivati12, filename: str):
tree = fattura_a38.build_etree()
# TODO default_namespace = "ns2" (instead of "ns0")
with open(filename, "wb") as out:
out.write(b'<?xml version="1.0" encoding="UTF-8" standalone="yes"?>\n')
tree.write(out, encoding="utf8", xml_declaration=False)
See 3 things:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
(apologies about my false claim in the comments above :point_up: this header is indeed contained in the XML coming from the "fattura elettronica" from AdE).standalone="yes"
.ns2
. Although namespaces shouldn't matter, for some reason XML examples I find on the internet and the AdE website seem to prefer ns2
and I would like to be able to inject that in order to get back the exact copy of the original (valid) XML content from AdE.With the workaround above I am able to generate a PDF equivalent of the XML with a command similar to:
a38tool pdf \
-f <path-to>/styles/FoglioStileAssoSoftware.xsl \
-o mia_fattura.pdf \
mia_fattura_output.xml
It would be great to have in a38
a utility function like the above - it would be very handy.
Hi, I really do not want to have to write the xml declaration by hand, and I suspect the core of the issue is that we need to pass encoding
and xml_declaration=True
to all tree.write
calls.
I would like to add to the test suite test cases corresponding to the crash situations that you found (even if with made up xml files), and at that point try to see how to address them in as clean a way as possible.
With regards to namespaces, I would consider trying to get an exact copy of the original XML file a false goal, because there are so many changing aspects in XML encoding that we would need to reimplement an own XML parser in order to preserve the quirks of other encoders. I would keep the goal of, say, making it so that if I run xmlstarlet fo
on the original and on the fattura coming out of a38, the results match. That is, that both XML files, when normalized, have the same content.
I can try to add some cases to the test suite (and it's really wrong that we call tree.write
without an encoding and forcing an xml declaration), and then if you still see cases that break after those got addressed, we can see how to add more
In tests/test_fattura.py
there is a TestSamples
test case that tries to load and save samples that are in tests/data/
with all codecs. I added a sample which has plenty of unicode, and so far it works.
Could you extend that test case to include the things that are failing for you?
If you need to tweak your samples to minimize anonymize them, you can try out the new a38tool edit feature :wink:
Thanks a lot for these features - now I just need to find the time to try them on my scenario and report back the results :crossed_fingers: :smile:
(BTW on a separate note regarding the CI pipeline: it would be good to trigger that at each push to master
- either direct or from a PR/branch merge. Also running those skipped tests about the encryption process that need a pre-processing step related to the certificates.).
OK, now the UTF-8 encoding works :tada:
pip install a38==0.1.5
(the release from April 1st 2022)# it would be good to wrap this in a function that accepts an instance
# of `a38.fattura.FatturaPrivati12` and a file path, but fine doing it by hand
tree = fattura_a38.build_etree()
with open(filename, "wb") as out:
tree.write(out, encoding="utf-8", xml_declaration=True)
xmllint
and checked the differences with the original XML file from AdE with diff --color original.xml generated.xml
I just see the namespace differences now, so the UTF-8 encoding issue is fixed. Also there's no standalone="yes"
in that XML header, which I added in the original issue description just because I was messing with those attributes. However, that is not even in the original document from AdE.
I had a look at the UTF-8 tests - they make sense, so given that my scenario now works I am glad I don't have to find a meaningful way to sanitise my XML to provide another example :blush:
Thanks!
I need to write some fields with UTF-8 characters e.g.
<Denominazione>Güügle</Denominazione>
. I've taken a valid XML file with some values like the above after having retrieved it from the AdE website (Agenzia Delle Entrate). This XML file is SDI validated.a38
library.a38tool pdf
and the usualFoglioStileAssoSoftware.xsl
from AssoSoftware.However these two things don't seem to work well together.
Take this code snippet where it seems I could use the
a38
XML builder instead of LXML:This way I am able to store a XML file to the file system containing
<Denominazione>Güügle</Denominazione>
. All good.When I invoke a command like:
Then I get this error:
That
line 42, column 36
in the error is the umlaut inGüügle
.How can I both:
a38
library?a38tool
command?Some context:
One final thing: I did not get a valid invoice from the SDI (AdE) with the first row as
<?xml version="1.0" encoding="utf-8"?>
although it was indeed containing some UTF-8 encoded characters like above.I found out from the AssoSoftware FAQ http://www.assosoftware.it/faq?catid=0&limit=10&start=50 that they recommend this encoding information to be the first line of the XML file. I am not sure if this is an issue with AdE/SDI or a too flexible interpetation, anyway here's the snippet: