Design considerations - Githubissues

victorsndvg commented 8 years ago

Hello @danshapero ,

I'm interested in implementing a Fortran OO sparse matrix layer. I'm diving through the sources of SiGMA and I can figure out some of the patterns used, but is too difficult for me to try to understand all the code without documentation.

There are any paper about the design and/or performance of SiGMA?

In particular, I would like to know if there is any documentation related with the design, class/state diagrams, etc. , and also how some design decisions affects to performance.

Thank you in advance and congratulation for your great work with SiGMA!

danshapero commented 8 years ago

Hi Victor, thanks for your interest! SiGMA is admittedly lacking in any documentation. I can make a UML diagram of the various classes with some notes if that will be helpful to you. By far the most unusual feature is the procedure for building graphs; I have a blog post about this here. My email address is on my github profile; feel free to write to me if there's anything that requires clarification.

You may also be interested in PSBLAS and some of the publications about it. They found practically no runtime performance penalty in the key computational kernels in going from a pure Fortran 90 library to using the object-oriented features of Fortran 2003. At one point I also wrote some benchmarks to look at the runtime cost of using objects in Fortran and found that it was practically negligible (< 5%). I think the greatest difference will actually be the performance cost for building matrices in the first place, but I haven't timed this yet.

victorsndvg commented 8 years ago

Hi Dan,

interesting post! i've no experience with nested procedures and procedures as arguments. I think it can be useful for our design considerations.

I've used ForUML to extract the UML diagrams from your source code. if you want, I can send you this diagrams to your personal email.

Ok, I've read some PSBLAS documentation and i've concluded some differences on the design pattern application.

First of all, in PSBLAS, I think that the main reason to implement the State pattern is to define a state diagram where the build stage is only for COO (for a better performance when inserting new entries). In SiGMA it seems that all formats can be in all stages of the life-cycle of a matrix. I'm right? there exists any state diagram in SiGMA related with the matrix format?

In PSBLAS they use the Mediator pattern in order to perform format conversions. I the case of SiGMA, format conversions are delegated on the Copy/Build procedures (based in the great mechanism explained in your post). I'm right? Can you tell me something about performance comparison of this two approachs?

Finally, there are different creational patterns, Builder/Prototype in PSBLAS, and Factory in SiGMA.

Maybe I'm wrong with some/all conclusions... but I cannot expend to much time in this study. What do you think?

Again, thank for your work and for the effort of reading/answer me. It's always nice to learn something new :)

Best, Víctor

fredcouderc commented 8 years ago

Hello,

I'm also interested in implementing a Fortran OO sparse matrix layer even I have no sufficient time to do the work properly I have already implemented in my private repo some very basic OO features to consider sparse matrix and associated algebra like in PSBLAS (to perform some iterative linear solver like BiCGStab). Resulting performances are quite satisfying even I have found some strange behaviour depending on the computing architecture. SiGMA is a very nice work that will certainly inspire me but has @victorsndvg said, hard to understand totally @victorsndvg, can you send me by email the UML diagram ?

Fred

danshapero commented 8 years ago

@victorsndvg that all sounds about right. The sparse matrix classes in SiGMA mostly follow the template pattern, but there is a wrapper class to hide some of the details. In PSBLAS, the equivalent wrapper class is introduced at a much lower level in the class hierarchy so that matrices can be transparently converted between formats. This is essential for them, since the COO format is used as an intermediate format so comprehensively.

PSBLAS is much more conscientious about separating out the various phases of the life cycle of matrix objects than I was; as you've pointed out, they use the state pattern to enforce these boundaries. Rather than do that, I tried to emphasize first creating a graph object first to represent the connectivity structure, building a sparse matrix on top of a graph, and then filling the matrix entries. There is nothing to stop you from adding entries to a matrix which were not pre-allocated in the graph, at a steep performance penalty. This is the default behavior in PETSc as well, although they allow you to set a policy where doing "bad" matrix updates throws an error instead, which would be worth adding to SiGMA.

The question of how to handle the life cycle of a sparse matrix is a hard one; do you try to make the class very flexible, so that the matrix can be modified at any time, or do you force the user to build the matrix structure first before using it?

I can't speak to the performance implications of the procedures I used for building graphs and format conversions; I wrote them that way more for flexibility. There are a handful of optimizations that could be applied here; for example, if you knew the edge stream proceeded row-by-row rather than in some random order, you could build a CSR graph in one pass over the edge stream rather than two.

Thanks for taking the time to make the UML diagram -- if you email it to me I'll include it in the repo. If there's anything else that doesn't make sense at first glance, please let me know and I'll write some proper documentation for it.

danshapero / sigma

Design considerations #2