Closed DavidHuebner closed 1 year ago
I created a Pull Request here: https://github.com/dkpro/dkpro-cassis/pull/291
It addresses both issues:
encode
function and I refactored the create_offset_mapping
In my experiments with about 20k CAS XMI files, this reduces the overall workload from about 49% in the initial create_offset_mapping
down to about 8% yielding an effective speed-up of over 1/3 (total time before=55s, total time after =32s) for reading CAS XMI.
Can you please have a look?
Is your feature request related to a problem? Please describe. I ran a profiler on a large amount of CAS XMI with varying size and relatively few annotations. I noticed two bottlenecks:
create_offset_mapping
function requires about half of the loading timeload_cas_from_xmi
, the abovecreate_offset_mapping
appears to be called twice. Once when_parse_sofa()
is called and once when the sofaString is set for the view.Describe the solution you'd like
I will prepare a Pull Request.
Additional context Profiler Screenshot: