DanielT / autosar-data

A rust crate to work with Autosar data (arxml) files
Apache License 2.0
32 stars 10 forks source link

Performance on big ARXML files > 20 MB #19

Closed christianhecht86 closed 1 month ago

christianhecht86 commented 1 month ago

Hi,

I like to read and write big (20MB ... 400MB) network ARXML files. All elements are contained in one big file.

I observed, a AR-Package containing more than 100 elements is a big performance problem.

I was able to make it significantly faster, by changing the function calc_element_insert_range. My Idea was, on AR-PACKAGE->ELEMENTS the insert position must the last one, every time.

   pub(crate) fn calc_element_insert_range(
        &self,
        element_name: ElementName,
        version: AutosarVersion,
    ) -> Result<(usize, usize), AutosarDataError> {

        //short optimization -> in ar-packages, we will always append
        if self.elemname == ElementName::Elements {
            return Ok((self.content.len(), self.content.len()));
        }
....

I think this could be a general thing, a list containing just one kind of element, than the insert position must the last one.

Do you have some further ideas, how to optimize? In which areas I can look to?

Thanks

Christian

DanielT commented 1 month ago

I agree that there is room for optimization here. I haven't run into the same issue, but that's probably just because I haven't tried to create large files.

calc_element_insert_range is the point where the validity of the sub-element is checked; your suggestion skips kips that and would allow anything to be inserted inside of <ELEMENTS>. On a basic level it would probably work to put this check inside of the if let Some(...) = elemtype.find_sub_element(...) block, because find_sub_element is the critical check.

However, since you've found this performance problem with <ELEMENTS> I suspect the same issue would appear, in other places like <FIBEX-ELEMENTS> or <ECU-COMM-PORT-INSTANCES>, so it would be best to find a more general solution.

I will probably find time to do some benchmarking of this scenario later in the week.

christianhecht86 commented 1 month ago

By the way, are you using the "compact" schema of AUTOSAR? This is a flattened version of the schema without all the inheritance stuff in it. It is located in "AUTOSAR_TR_XMLSchemaSupplement.zip" eg AUTOSAR_00049_COMPACT.xsd or AUTOSAR_00049_STRICT_COMPACT.xsd

DanielT commented 1 month ago

autosar-data uses a built-in parsing table. The table contains the unified data for all Autosar versions from 4.0.1 - R23-11. It is derived from the "regular" xsd files.

christianhecht86 commented 1 month ago

I thought using the compact schema instead of the "normal" one could bring some performance advantages, but I saw you do some flattening in the "autosar-xsd-mangler"

DanielT commented 1 month ago

I just pushed a change to optimize this case. As far as I can tell this should fully fix the performance problem; it would be great if you could try it and confirm this.

christianhecht86 commented 1 month ago

Yes, this fixes the problem. Thank you, for your fast reaction.