AlexandraImbrisca / open-MaStR

A collaborative software to download the energy database Marktstammdatenregister (MaStR)
https://open-mastr.readthedocs.io/en/latest/
GNU Affero General Public License v3.0
0 stars 0 forks source link

Create a benchmark to evaluate the parsing speed #1

Open AlexandraImbrisca opened 1 week ago

AlexandraImbrisca commented 1 week ago

In order to evaluate different parsing strategies, we should create a benchmark that evaluates their speed.

It should take into account:

AlexandraImbrisca commented 1 day ago

Hi @FlorianK13! I analysed our options for creating the benchmark and I'd like to hear your thoughts :)

Option 1: Using fake data

Option 2: Using the existing datasets as a base

@FlorianK13 what do you think? 🤔 If you think there is a need for option 1, I can definitely prioritize that :)

FlorianK13 commented 1 day ago

Option 2 seems good. It is less work, and all the data is open data, hence there is no need for anonymization. Since the benchmark would take quite long with the whole MaStR dataset, maybe you could manully choose a subset? You can have a look at the zipped folder that is downloaded in ~/.open-mastr/data/xml-download and choose a subset of xml files from there as your benchmark?

A quick solution would be the following:

That should work as a benchmark, but if you have better ideas I'm also fine with that.

AlexandraImbrisca commented 9 hours ago

Sounds great! Thanks a lot for responding so fast! :) I'll create a few smaller datasets and include them in the benchmark.

I'll create a pull request with the benchmark today and add you as a reviewer

AlexandraImbrisca commented 7 hours ago

@FlorianK13 I created the pull request here: https://github.com/AlexandraImbrisca/open-MaStR/pull/2. Could you please add yourself as a reviewer? Github doesn't allow me to add you. It might be related to the access so I just invited you as a collaborator now :)