[ENH] Performance/ Benchmarking/ Scaling

sofroniewn commented 2 years ago

Is your feature request related to a problem? Please describe. This is maybe more of a question than a feature request - but it could turn into one.

I saw you've got one of two files in the repo around benchmarks - like time_forward.py - but I was wondering if you have done or are interested in doing more widespread and systematic benchmarking - particularly with respect to how performance scales with variables like number of atoms for different basis sets on cpu and gpu.

I'm wondering how you expect performance to compare to a library like PySCF just knowing the architecture of dqc? I'm also curious if performance / scale we're things you were interested in or even motivated by when developing dqc too?

I havn't tried any comparisons yet and don't really know enough about the actual implementations to have any expectations one way or another. There is a little bit of PySCF benchmark data that could be compared to, or we could compute our own.

Ideally I'd like to push to as large systems as possible, but I am so new to this space that I'm really not sure what is possible. If I could do things on the scale of amino acids (~20 atoms) that would be a nice start - getting to ~100 atoms would be even better, and so it continues!

Describe the solution you'd like I could imagine developing a benchmarking suite of both basic calculations and calculations of properties (like IR spectra) and a set of molecules of increasing size and then measuring calculation time for a variety of different basis sets and hardware configurations, cpu/ gpu. The goal would be to assess performance as molecule size increased.

Describe alternatives you've considered I could do ad-hoc testing using basic timing functionality to try and build up an intuitive feel for scaling performance

Additional context If this was something you were interested in I'd appreciate your help in designing the best approach to benchmarking as I am so new to this space.

If benchmarking is successful one can then start doing profiling to try and identify performance gaps and ultimately work on improving performance

mfkasim1 commented 2 years ago

This is a good question and request. I did a small scale benchmarking for the DQC paper in here. However, it only tests small molecules (the largest one was C6H8O6 with density fitting). I didn't do comparison with GPU because DQC uses libcint that has no GPU interface, so DQC in GPU might not work. Several things that I identified as slow when doing the benchmark are:

The two-electron integrals in PySCF exploit the symmetry of the 4-dimensional tensor, therefore it reduces the computation about 8 times. This is quite hard to solve in DQC because even though the forward calculation is symmetric, in the backward calculation the symmetry can be different, making it quite cumbersome to code.
The time it spent during the integral of xc potential for DFT is quite slow because of the large number of integration points per atom. This might be sped up with GPU, but I haven't done so because libcint has no GPU interface.

If you'd like to do an extensive benchmark, it would be great! We can list the bottlenecks here, so start working on improving the performance.

sofroniewn commented 2 years ago

I did a small scale benchmarking for the DQC paper in here. However, it only tests small molecules (the largest one was C6H8O6 with density fitting).

This is a great start, thanks for the link to the notebook, I hadn't seen that yet!

I didn't do comparison with GPU because DQC uses libcint that has no GPU interface, so DQC in GPU might not work.

Ah ok - thanks for the heads up - I will look out for that.

If you'd like to do an extensive benchmark, it would be great! We can list the bottlenecks here, so start working on improving the performance.

Yeah, I'm up for trying. Maybe let's start listing out what you'd like to see in the benchmarking and deciding what tools we want to use.

I have used airspeed velocity (asv) for previous benchmarking efforts. Have you used asv? Do you have a strong preference for another benchmarking tool? There's a blog post I really like about using asv for continuous benchmarking in CI, which is maybe more advanced than what we need right now, but it is nice to see how using asv can scale to more advanced applications if you need. I could imagine creating a folder of asv compatible tests that might look like this one from the scikit-image benchmarks.

Another nice feature of asv is that they have a nice syntax for parameterized benchmarks and multiparameter benchmarks which could be useful for addressing the scaling questions I mentioned above.

Let me know what you recommend as a best next step.

mfkasim1 commented 2 years ago

I've always wanted to use asv but had no time (and enough motivation) to actually learn and use it, so it is a happy coincident! Let me list the things I have in mind:

For the benchmark, it's good also to include the time requires if SCF is performed only with 1 iteration (to see if the calculation is slow or the SCF algorithm just takes a lot of steps)
There are 2 quantum chemistry calculations (HF & DFT) and 2 modes of computing the electron integrals (direct & density fitting). They probably have different bottlenecks and (maybe) scaling, so it's also good to include all 4 combinations.
If you want to follow the steps in the blog, feel free to add a new file in the .github/workflows/ directory. Uploading the results as an artifact is a good first step, but if there's a way to automatically put the results in the documentations, it would be ideal.

sofroniewn commented 2 years ago

I've always wanted to use asv but had no time (and enough motivation) to actually learn and use it, so it is a happy coincident!

Great!

I have begun work with a very simple proof of concept PR at #13.

If you want to follow the steps in the blog, feel free to add a new file in the .github/workflows/ directory. Uploading the results as an artifact is a good first step, but if there's a way to automatically put the results in the documentations, it would be ideal.

I think I will probably wait on this stuff if that's ok. Maybe we get a set of benchmarks that we like and can run locally first and then worry about CI integration / automatically updating results.

There are 2 quantum chemistry calculations (HF & DFT) and 2 modes of computing the electron integrals (direct & density fitting). They probably have different bottlenecks and (maybe) scaling, so it's also good to include all 4 combinations.

Ok yeah, we can either include those a different parameters or as different benchmarks. It might depend a little on how we end up structuring the benchmarks

For the benchmark, it's good also to include the time requires if SCF is performed only with 1 iteration (to see if the calculation is slow or the SCF algorithm just takes a lot of steps)

This makes a lot of sense - I might need to help specifying that - I didn't see anywhere obvious to me in the API where I could control that.

diffqc / dqc

[ENH] Performance/ Benchmarking/ Scaling #12