bebop / poly

A Go package for engineering organisms.
https://pkg.go.dev/github.com/bebop/poly
MIT License
659 stars 68 forks source link

Create Automated Benchmarking Suite #366

Open TimothyStiles opened 9 months ago

TimothyStiles commented 9 months ago

It'd be really cool to have a benchmarking suite that we can run to see if we've unintentionally introduced any performance changes before merging into main.

Idea would be that on PR creation we'd run the benchmarks on both the main branch and the PR branch, and use it to highlight any significant changes (positive and negative).

We can start slow with what we'd consider "problem areas" and continue out from there.

carreter commented 9 months ago

I think this might combine well with #362 . If we have a tutorial series that takes a real-world example from end to end throughout our package, we can also use it to benchmark performance.

Koeng101 commented 9 months ago

I think it'd be great to benchmark against all of Genbank or uniprot or pdb. Would take a server with decent hard drives, or just a lot of data per month to stream, and would actually validate that our parsers work well.

carreter commented 9 months ago

This is a great idea! We could have our new CI/CD pipeline (#365) incorproate this.

I don't think it'd be advisable to have it run against ALL of these massive datasets every time we merge, but we could have it pick a consistent, representative subset.

It'd be nice to also have all new entries in these DBs run against the latest version of our parsers.

Also, these DBs aren't that big size-wise since it's just text and not image data, right? I have no clue, this is a genuine question.

Koeng101 commented 9 months ago

Genbank I think is a little over a terabyte, so not that bad. Uniprot is like 250gb. SRA, on the other hand, is 33 petabytes (and the wayback machine is 57 petabytes), so kinda puts it into perspective. SRA there is NO WAY we could handle, but Genbank+uniprot would probably be doable.

github-actions[bot] commented 7 months ago

This issue has had no activity in the past 2 months. Marking as stale.