scalapack bug fix for factorize

LLNL / libROM

Model reduction library with an emphasis on large scale parallelism and linear subspace methods

https://www.librom.net

Other

201 stars 36 forks source link

scalapack bug fix for factorize #263

Closed dreamer2368 closed 7 months ago

dreamer2368 commented 8 months ago

The scalapack routine pdgesvd performs a workspace query at its initial call in factorize in lib/linalg/scalapack_f_wrapper.f90, where it returns the required size of working array. If the size exceeds 1e9, then an overflow occurs and the returned value can be invalid.

The fortran routine correct_svd_workarray_size copied the size calculation part of scalapack pdgesvd, and computes the size as double in order to avoid the overflow. This routine is called in factorize, where it returns a warning message and resize the working array if the size seems to have experienced an overflow.

dreamer2368 commented 8 months ago

@ckendrick ,

@dreamer2368 have you verified if the output of pdgesvd is correct with these changes? This fix should allocate the proper workspace size, but I worry that lwork passed into pdgesvd has still overflowed and any integers Scalapack uses internally for indexing will overflow at these sizes.

I think you're right. I checked the resulting basis through running a regression test and it seems to be invalid.

We may have to modify the libROM scalapack wrapper to support int64 and compile scalapack with support for 8 byte integers since it seems to be using int32 by default.

I'm not so clear how to implement this part and it would take some time to figure out. Meanwhile, I'll put this PR as WIP.

dreamer2368 commented 7 months ago

This workaround actually does not circumvent the overflow issue of scalapack. While we could compile both librom and scalapack with long integer, an ultimate solution is to run factorize in parallel. closing the PR now.