Closed sskagemo closed 1 year ago
I've cut and pasted the code wrote from the files and working in, and I've tried a lot of changes, so can't be 100 % sure if this is the code that actually ran, but I guess you get the idea ...
```python from dataclasses import dataclass, asdict from bigxml import Parser, xml_handle_element, xml_handle_text @xml_handle_element("AuditFile", "GeneralLedgerEntries", "Journal", "Transaction") @dataclass class Transaction: TransactionID: str = "N/A" Period: str = "N/A" PeriodYear: str = "N/A" TransactionDate: str = "N/A" TransactionType: str = "N/A" Description: str = "N/A" SystemEntryDate: str = "N/A" GLPostingDate: str = "N/A" lines: list = None @xml_handle_element("TransactionID") def handle_TransactionID(self, node): self.TransactionID = node.text @xml_handle_element("Period") def handle_Period(self, node): self.Period = node.text @xml_handle_element("PeriodYear") def handle_PeriodYear(self, node): self.PeriodYear = node.text @xml_handle_element("TransactionDate") def handle_TransactionDate(self, node): self.TransactionDate = node.text @xml_handle_element("TransactionType") def handle_TransactionType(self, node): self.TransactionType = node.text @xml_handle_element("Description") def handle_Description(self, node): self.Description = node.text @xml_handle_element("SystemEntryDate") def handle_SystemEntryDate(self, node): self.SystemEntryDate = node.text @xml_handle_element("GLPostingDate") def handle_GLPostingDate(self, node): self.GLPostingDate = node.text @xml_handle_element("Line") def handle_Line(self, node): line = [item for item in node.iter_from(Line)] self.lines = [line] if self.lines == None else self.lines.append(line) @xml_handle_element("Line") @dataclass class Line: # Ignores the Analysis-elements for now RecordID: str = "N/A" AccountID: str = "N/A" ValueDate: str = "N/A" SourceDocumentID: str = "N/A" SupplierID: str = "N/A" Description: str = "N/A" DebitAmount: float = 0.0 CreditAmount: float = 0.0 ReferenceNumber: str = "N/A" TaxType: str = "N/A" # Assumption that there is only one TaxInformation-element pr line TaxCode: str = "N/A" TaxPercentage: int = 0 TaxBase: float = 0.0 TaxAmount: float = 0.0 @xml_handle_element("RecordID") def handle_RecordID(self, node): self.RecordID = node.text @xml_handle_element("AccountID") def handle_AccountID(self, node): self.AccountID = node.text @xml_handle_element("ValueDate") def handle_ValueDate(self, node): self.ValueDate = node.text @xml_handle_element("SourceDocumentID") def handle_SourceDocumentID(self, node): self.SourceDocumentID = node.text @xml_handle_element("SupplierID") def handle_SupplierID(self, node): self.SupplierID = node.text @xml_handle_element("Description") def handle_Description(self, node): self.Description = node.text @xml_handle_element("DebitAmount") def handle_DebitAmount(self, node): self.DebitAmount = float(node.text) # Will automatically go one level deeper to get the value @xml_handle_element("CreditAmount") def handle_CreditAmount(self, node): self.CreditAmount = float(node.text) # Will automatically go one level deeper to get the value @xml_handle_element("ReferenceNumber") def handle_ReferenceNumber(self, node): self.ReferenceNumber = node.text @xml_handle_element("TaxInformation") def handle_TaxInformation(self, node): yield from node.iter_from(self.handle_TaxType, self.handle_TaxCode, self.handle_TaxPercentage, self.handle_TaxBase, self.handle_TaxAmount) @xml_handle_element("TaxType") def handle_TaxType(self, node): self.TaxType = node.text @xml_handle_element("TaxCode") def handle_TaxCode(self, node): self.TaxCode = node.text @xml_handle_element("TaxPercentage") def handle_TaxPercentage(self, node): self.TaxPercentage = int(node.text) @xml_handle_element("TaxBase") def handle_TaxBase(self, node): self.TaxBase = float(node.text) @xml_handle_element("TaxAmount") def handle_TaxAmount(self, node): self.TaxAmount = float(node.text) if __name__ == '__main__': with open("../../testdata/ExampleFile_SAF-T_Financial_888888888_20180228235959.xml", "rb") as f: for item in Parser(f).iter_from(Transaction): print(item) break # To avoid too much output ... ```
Hello @sskagemo, glad you too la look into the library and its documentation! I agree your usecase is quite difficult to handle right now, but will be taken into consideration for a future change of the library.
For now you will need to do the following trick:
handle_Line
by handle_Line = Line
(this require to move the Line
class definition above the Transaction
one)xml_handler
methodYou will find below your code modified so that you have a better idea of what I mean. I also took this opportunity to improve the following points:
Transaction.lines
by using a dataclass field
handle_TaxXxx
handlers```python from dataclasses import dataclass, asdict, field from bigxml import Parser, xml_handle_element, xml_handle_text @xml_handle_element("Line") @dataclass class Line: # Ignores the Analysis-elements for now RecordID: str = "N/A" AccountID: str = "N/A" ValueDate: str = "N/A" SourceDocumentID: str = "N/A" SupplierID: str = "N/A" Description: str = "N/A" DebitAmount: float = 0.0 CreditAmount: float = 0.0 ReferenceNumber: str = "N/A" TaxType: str = "N/A" # Assumption that there is only one TaxInformation-element pr line TaxCode: str = "N/A" TaxPercentage: int = 0 TaxBase: float = 0.0 TaxAmount: float = 0.0 @xml_handle_element("RecordID") def handle_RecordID(self, node): self.RecordID = node.text @xml_handle_element("AccountID") def handle_AccountID(self, node): self.AccountID = node.text @xml_handle_element("ValueDate") def handle_ValueDate(self, node): self.ValueDate = node.text @xml_handle_element("SourceDocumentID") def handle_SourceDocumentID(self, node): self.SourceDocumentID = node.text @xml_handle_element("SupplierID") def handle_SupplierID(self, node): self.SupplierID = node.text @xml_handle_element("Description") def handle_Description(self, node): self.Description = node.text @xml_handle_element("DebitAmount") def handle_DebitAmount(self, node): self.DebitAmount = float(node.text) # Will automatically go one level deeper to get the value @xml_handle_element("CreditAmount") def handle_CreditAmount(self, node): self.CreditAmount = float(node.text) # Will automatically go one level deeper to get the value @xml_handle_element("ReferenceNumber") def handle_ReferenceNumber(self, node): self.ReferenceNumber = node.text @xml_handle_element("TaxInformation", "TaxType") def handle_TaxType(self, node): self.TaxType = node.text @xml_handle_element("TaxInformation", "TaxCode") def handle_TaxCode(self, node): self.TaxCode = node.text @xml_handle_element("TaxInformation", "TaxPercentage") def handle_TaxPercentage(self, node): self.TaxPercentage = int(node.text) @xml_handle_element("TaxInformation", "TaxBase") def handle_TaxBase(self, node): self.TaxBase = float(node.text) @xml_handle_element("TaxInformation", "TaxAmount") def handle_TaxAmount(self, node): self.TaxAmount = float(node.text) @xml_handle_element("AuditFile", "GeneralLedgerEntries", "Journal", "Transaction") @dataclass class Transaction: TransactionID: str = "N/A" Period: str = "N/A" PeriodYear: str = "N/A" TransactionDate: str = "N/A" TransactionType: str = "N/A" Description: str = "N/A" SystemEntryDate: str = "N/A" GLPostingDate: str = "N/A" lines: list = field(default_factory=list) @xml_handle_element("TransactionID") def handle_TransactionID(self, node): self.TransactionID = node.text @xml_handle_element("Period") def handle_Period(self, node): self.Period = node.text @xml_handle_element("PeriodYear") def handle_PeriodYear(self, node): self.PeriodYear = node.text @xml_handle_element("TransactionDate") def handle_TransactionDate(self, node): self.TransactionDate = node.text @xml_handle_element("TransactionType") def handle_TransactionType(self, node): self.TransactionType = node.text @xml_handle_element("Description") def handle_Description(self, node): self.Description = node.text @xml_handle_element("SystemEntryDate") def handle_SystemEntryDate(self, node): self.SystemEntryDate = node.text @xml_handle_element("GLPostingDate") def handle_GLPostingDate(self, node): self.GLPostingDate = node.text handle_line = Line def xml_handler(self, iterator): for item in iterator: if isinstance(item, Line): self.lines.append(item) else: raise NotImplementedError # should not happen yield self if __name__ == '__main__': with open("../../testdata/ExampleFile_SAF-T_Financial_888888888_20180228235959.xml", "rb") as f: for item in Parser(f).iter_from(Transaction): print(item) break # To avoid too much output ... ```
Tell me if that works for you!
It worked! Thank you very much for helping me! And maybe most importantly, for not making me feel totally useless, by your kind comment:
I agree your usecase is quite difficult to handle right now,
:-)
Very good! I'm closing this issue now since it seems to be solved, but I will try to remember to ping you whenever an easier way to do it will be released.
Thank you for a great tool!
I am trying to write a more efficient way of extracting data from this file: https://github.com/Skatteetaten/saf-t/blob/master/Example%20Files/ExampleFile%20SAF-T%20Financial_999999999_20161125213512.xml
Starting on line 314 are the Transaction-elements I'm interested in. But each Transactions have a set of Line-elements, I need to flatten the output, having one record for each Line, but each of these lines must repeat all the details for the transaction they are part of.
I tried making a dataclass for Transaction, and manage to get that working. But I haven't found a solving the Lines-bit. I tried by defining a Lines-dataclass, and a Lines-handler like this in the Transactions-dataclass:
Instead of getting two or three Lines-elements, I get 12, and most of the attributes are with the default values from the dataclass (N/A or 0)
I've read the documentation thorougly, including trying to understand if there is some way to benefit from the "syntactic sugar"-part, but I have to admit that I don't really understand it ... sorry!
I am not very experienced in Python, so apologies for asking a stupid question here. For more context, I am more or less trying to achieve what is described in this post, but without the MS-tools: https://blogs.sap.com/2022/09/30/big-xml-file-flattening-with-excel-power-query-for-saf-t-and-other-requirements/