OpenMathLib / OpenBLAS

OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
http://www.openblas.net
BSD 3-Clause "New" or "Revised" License
6.28k stars 1.49k forks source link

BFloat16 data type naming #2767

Closed Guobing-Chen closed 3 years ago

Guobing-Chen commented 4 years ago

In current code base, the BFloat16 data type is named as shxxxx (e.x.: shgemm), and related build flag as BUILD_HALF. Seems we simply take BF16 as half precision of float. This is not true as to the standard definition of IEEE. Half precsion should be FP16, which is different in format and content as compared with BFloat16. As for OpenBLAS, we may both support BFloat16 and and FP16 as valuable for different domains -- BFloat16 mostly valuable for Deep Learning and Machine Learning, while FP16 more valuable for traditional scientific computation and telecom processing.

Suggest to change the naming of data type and build flag to be bxxxx (e.x.: bgemm) and BUILD_BF16. While we can leave shxxxx for the real half precision data type -- FP16, or even make it as hxxxx (e.x.: hgemm). Whatever, use shxxxx and BUILD_HALF is quite confusing to community, every other math libs like Eigen/oneDNN (previously mkldnn) use keywords like bf or bf16 or so.

I can submit PR to make this change if we are OK on this.

RajalakshmiSR commented 4 years ago

The first 's' is for the result (single-precision). If b is for the input (bf16), bgemm would imply that both input and output are bfloat16, which is not the case. So 'sbgemm' would explain it better.

Guobing-Chen commented 4 years ago

Good point! 'sb' prefix is good to me also as specific to bfloat16.

martin-frbg commented 4 years ago

I remember suggesting to @edelsohn that function names be coordinated with other interested parties before the code lands, but I have to admit that seeing the actual implementation might be a prerequisite. Also I have no clear idea what the appropriate panel for deciding naming and API for BLAS extensions would be these days ?

Guobing-Chen commented 4 years ago

I remember suggesting to @edelsohn that function names be coordinated with other interested parties before the code lands, but I have to admit that seeing the actual implementation might be a prerequisite. Also I have no clear idea what the appropriate panel for deciding naming and API for BLAS extensions would be these days ?

Do you mean naming thing and new APIs should be discussed in a specific panel? If Yes, can I attend? Besides this naming, I also have some new APIs (extensions) to suggest to add.

martin-frbg commented 4 years ago

TBH I haven't the froggiest idea how this is currently handled, if at all. Back in the early days of BLAS and LAPACK there must have been enough of a working group to leave meeting notes and whitepapers for posterity, but as I came into my maintainer role purely from a user background as a computational chemist I do not know if any such coordination is still taking place.

Guobing-Chen commented 4 years ago

Emmm... then can we make it simple to just add a note in the release note, considering that this BF16 based shgemm is just added recently that suppose very little users will be impacted.

Also a side question may be silly one -- I don't see this shgemm API been added in cblas.h, then how a user will use it normally?

martin-frbg commented 4 years ago

Good point - I am old enough to consider FORTRAN "normal use" though :smile:

Guobing-Chen commented 4 years ago

Definitely you run so quickly to that far away LOL

martin-frbg commented 4 years ago

Still curious - is there any coordination on BLAS/LAPACK extension APIs still happening among major interested parties @langou @cparrott73 or does everybody get to name their pet function as they like ?

Guobing-Chen commented 4 years ago

I created a PR about the change of the naming based on this proposal and RajalakshmiSR's comment. We can update the PR more if we have more idea on the naming/prefix.

edelsohn commented 4 years ago

The naming was coordinated with and approved by Jack Dongarra, who is the leader of NETLIB. OpenBLAS cannot start inventing names.

martin-frbg commented 4 years ago

Thx. I vaguely remembered your stated intention but not any outcome, and did not see it mentioned at netlib.I guess OpenBLAS should evolve its own documentation eventually - added the information to the "OpenBLAS extensions" page in the wiki for now.

brada4 commented 4 years ago

IMO extensions should be configurable out so that people dont get into thinking they somehow are part of BLAS and build eventually broken executables.

Guobing-Chen commented 4 years ago

The naming was coordinated with and approved by Jack Dongarra, who is the leader of NETLIB. OpenBLAS cannot start inventing names.

OK, but still this 'sh' prefix is quite confusing, and it is totally different with other math libs supporting BFloat16. Any channel to raise a concern to Jack? And any place we can find netlib's spec about BF16 based APIs or naming?

mgates3 commented 4 years ago

BLAS G2 has a proposed naming scheme. Here's the extended abstract (4 page) https://tinyurl.com/yb7m7lov and the whole proposal (48 page) http://goo.gl/D1UKnw

To summarize:

real has "r" and # bits: gemm_r16 is IEEE 16-bit gemm gemm_rb16 is bfloat 16-bit gemm gemm_r32 is 32-bit sgemm gemm_r64 is 64-bit dgemm gemm_r128 is 128-bit gemm gemm_r64x2 is double-double gemm

complex has "c" and # bits: gemm_c16 is IEEE 16-bit gemm gemm_cb16 is bfloat 16-bit gemm gemm_c32 is 32-bit cgemm gemm_c64 is 64-bit zgemm gemm_c128 is 128-bit gemm gemm_c64x2 is double-double gemm

The scheme also allows for combinations, like gemm_r16r16r32 for A, B matrices in 16-bit, C matrix in 32-bit, as for NVIDIA's Tensor cores, as well as extension to reproducible accumulators, routines with separate C_in and C_out arguments, etc.

Mark Gates Innovative Computing Laboratory, University of Tennessee, Knoxville (Jack Dongarra's group)

martin-frbg commented 4 years ago

Great, thanks (time to fetch myself a beer I guess). That looks to be next year's can of worms though ?

ceseo commented 4 years ago

The naming was coordinated with and approved by Jack Dongarra, who is the leader of NETLIB. OpenBLAS cannot start inventing names.

OK, but still this 'sh' prefix is quite confusing, and it is totally different with other math libs supporting BFloat16. Any channel to raise a concern to Jack? And any place we can find netlib's spec about BF16 based APIs or naming?

Some clarification: at the time the decision was made, the only half precision type being considered was bfloat16. That motivated choosing 'H' for the type.

I agree that, if OpenBLAS is going to support IEEE 16-bit, this can be confusing. I have no objection changing it, as long all the parties involved are in agreement.

Guobing-Chen commented 4 years ago

Thanks the information of BLAS G2, @mgates3. This looks great to me as for next generation to make all the API more readable and flexible. Just to confirm that you aims to show us this next generation BLAS, but not try to explain why we choose the 'h' prefix, Right?

And also thanks for the background information and opinion, @ceseo. That is also my point. When people (users, general developers, etc.) look into or think about half precision, mostly it comes to IEEE's half precision format (because that is from IEEE... LOL) which is FP16, while BFloat16 is taken as BFLOAT16 or bfloat16 or BF16 or bf or b. This is also the practice for all other libs supporting this data type. OpenBLAS using this 'h' prefix will definitely confuse any users caring this part. That will be a loss for OpenBLAS, and sad to OpenBLAS fans like me.

brada4 commented 4 years ago

Short excerpt from clang source:

bfloat is currently stored as a double internally because

mgates3 commented 4 years ago

Yes, that was to give an update on where the BLAS standard is going. Namely, that we're proposing to move away from single letter precisions, so likely will never standardize on h for half. Any feedback on the proposal is also welcome—you can comment in the Google doc. h has been used in some libraries for half, but differentiating IEEE half and bfloat16 half with one letter is confusing. Also, what would be used for complex-half? Single letters are just not flexible and descriptive enough.

Using sh, when the output is single (32-bit) and the input is half (16-bit), would be consistent with other routines such as scnrm2. Again, I wouldn't expect that to be standardized.

martin-frbg commented 4 years ago

The drawback of having verbose names is that you are halfway through writing down the implementation by the time you have finished typing the name, so everybody will come up with their personal shortcut macros (unless forcing everyone to use an IDE with predictive texts or a full code-generating AI is the ultimate goal). Perhaps the current generation of freshmen will eventually have us all use emoji (which would at least make it easy to name functions that have side effects.) Am I reading you correctly here as stating that there will be no interim standard before adoption of that next-gen BLAS naming, turning the supposedly supreme blessing given SHGEMM into more of a "don't really care how you call it in the old idiom" ?

Guobing-Chen commented 4 years ago

If there is no preferred standard for this before BLAS G2, then end-users and BFloat16 community's existing convention can be a good way to follow, like 'b' or 'bf' or 'bf16' or 'bfloat16', etc.

How do you think, Martin?

martin-frbg commented 4 years ago

I think I would like to see the new function used for something other than big company politics...

Guobing-Chen commented 4 years ago

That's fair. And my purpose to raise this issue is also try to help these new BF16 related functions in OpenBLAS better utilized by community as providing a more user familiar naming.

And these daysI am preparing more BF16 functions for OpenBLAS, if the decision has made to keep the current naming, I will follow this convention for my new functions.

martin-frbg commented 4 years ago

Haven't seen any messages from the mountain top since the categorical "OpenBLAS cannot start inventing names" but I guess we could always create aliases if there are alternate names already in use. (Would be nice if functions differ in name only though).

Be warned that everything besides SHGEMM itself may appear as if it is already implemented, but actually leads to the corresponding single-precision real function without any attempt at argument conversion right now.

conradsnicta commented 4 years ago

(As per suggestion from @martin-frbg, continuing a discussion started in https://github.com/xianyi/OpenBLAS/pull/2796)

If you are quoting from #2767, you will have seen how the SH.. names came to be. Please comment there if you are not satisfied with Edelsohn's dictum.

OpenBLAS can certainly push back against any "dictums", especially recently made ones that were done without properly taking into account context and existing published standards. Just because something was seemingly rubber stamped by Jack Dongarra, it doesn't automatically mean it's the best decision.

I suspect the size of the userbase for OpenBLAS is far larger than "plain" BLAS these days. Given that, OpenBLAS users and contributors can at the very least provide actionable feedback on actual real world usage and implications of any proposed changes/extensions to BLAS. There is also no need to propagate questionable changes from BLAS into OpenBLAS; this type of action can be used as an alternative feedback mechanism, if nothing gets through "normal" channels.

edelsohn commented 4 years ago

I find all of the comments about "Edelsohn's dictums" offensive and unnecessary.

We, IBM, did not invent names nor force them on OpenBLAS. There was a question of how to name the new functions and we approached Jack Dongarra and the Innovative Computing Lab, who maintain NETLIB, about the recommended naming. The definition of BLAS is not a standard maintained by the ISO. The landscape of precisions has become more complicated than when ICL provided the guidance, so all of us need to adapt. Because of the rapid changes in hardware and use in ML/DL algorithms, this field is evolving rapidly.

BLAS is not just OpenBLAS: it's NETLIB, ATLAS, BLIS, IBM ESSL, Intel MKL, Eigen, and others. We, and other vendors adding low-precision features, were and are trying to ensure compatibility across the spectrum of packages.

Martin, Guobing, and Conrad are welcome to contact ICL, join the conversation and help to coordinate this space. Pointing fingers and recommending unilateral action is not productive.

martin-frbg commented 4 years ago

https://github.com/xianyi/OpenBLAS/issues/2767#issuecomment-671896840 is certainly open to (mis)interpretation. I had hoped that the naming would have been agreed on (and perhaps publicized in whatever the usual manner, if any) by "vendors" and other interested parties by the time the contribution got merged, and not have this devolve into a company grandstanding contest before the new functionality got even one user. Keeping the number of unresolved technical issues in check is hard enough as it is.

quickwritereader commented 4 years ago

Though I'm not entitled to give opinions on these matters, I think the importance of the names and letters choice is irrelevant here. The most important thing is should be having standardized mangled entry names. The ones who want friendly names could write generators to produce. so I believe the OpenBlas team should adopt suggested names and concentrate on performance

edelsohn commented 4 years ago

We thought that the names and semantics for reduced precision that IBM contributed to OpenBLAS had the agreement of ICL. The IBM team will work with OpenBLAS to update the APIs and implementation so that they are compatible with the definitions being adopted by ICL and other packages.

Guobing-Chen commented 4 years ago

Just re-read all the comments, though that I am already working on new bfloat16 functions with 'sh' prefix...

One question to @edelsohn, could you show more detail on in which place/doc ICL approved to use the 'h' prefix? I have this question because @mgates3 said in comments that there is no standard and will not be standard single letter like 'h', and the trending is with more comprehensive name in BLAS G2 which is working in progress. As @mgates3 comes from ICL, you two's comments seem a bit conflict.

conradsnicta commented 4 years ago

@edelsohn We may need to clarify the timeline and what exactly has been proposed by who (and to who). I agree with @Guobing-Chen about the need for a specific document for the proposed naming, and how this doesn't conflict with both the comment by @mgates3: "... likely will never standardize on h for half ..." and the proposal given in http://goo.gl/D1UKnw

The landscape of precisions has become more complicated than when ICL provided the guidance (...) Because of the rapid changes in hardware and use in ML/DL algorithms, this field is evolving rapidly.

I'm not sure I agree with assertion about all of this being "complicated" and "rapid". bfloat16 has been around for a while now, and IEEE half precision floating point has been around for even longer. These are clearly two well-known, valid and separate formats to represent floating point numbers stored in 16 bit. I don't see why all of a sudden there should be a mad rush to implement a half-baked proposal (no pun intended) that confuses between the two formats.

martin-frbg commented 4 years ago

I suspect the impetus comes from recent implementations of bfloat16 operations in CPU rather than GPU hardware that are assumed to become a strong selling point for certain high-end processors. And I must admit that I am not even aware who still uses IEEE half precision nowadays - I remember it from GPU shaders but never saw it in any of the scientific and engineering fields I came into contact with over the years. There will probably be important ones, and in any case creating confusion among such a mixed userbase as OpenBLAS is likely to have is certainly not a good start. (Which is why I am a bit unhappy with the current situation, though we may have arrived at it through a series of honest misunderstandings. For all the merits of the next-gen naming proposal, I suspect it will be years to its wide-spread adoption while what is needed is an acceptable solution for now, for both "AI" and "IEEE" camps, in the current F77 alphabet soup style.)

edelsohn commented 4 years ago

The U.S. is returning from a weekend holiday. I am asking my colleague who had the conversation with Jack to clarify the naming information.

joseemoreira commented 4 years ago

Dear Friend,

I have not been following this thread in detail, but I feel it would be appropriate for me to clarify how the name "shgemm" came to be proposed by the IBM team for this new mixed-precision routine we are discussing. I want to start by saying that I am solely responsible for choosing that name.

Future IBM processors will have support for reduced-precision 16-bit floating-point formats, including the format commonly known as bfloat16. As we prepare for those future processors, it became clear that it would be useful to have routines in BLAS (and OpenBLAS in particular) that support the format. Our first candidate routine was general matrix multiply (GEMM) that took two bfloat16 inputs A and B and produced a single-precision (IEEE binary32 format) result C. We developed and coded a reference implementation for that routine, as well as an implementation optimized for the future IBM POWER10 processor.

We had to choose a name for our new routine. I reached out to members of the Innovative Computing Laboratory at the University of Tennessee for guidance in naming mixed-precision routines. I received good guidance and reference materials. One of those materials was a pointer to the way MAGMA has named mixed-precision routines, as described in "https://icl.cs.utk.edu/projectsfiles/magma/doxygen/routines.html", with the warning that the naming approach could be improved. Inspired by the approach in MAGMA of using two-letter prefixed (first for result precision, second for input precision) I personally made the decision of calling the new routine "shgemm". That was my decision and my decision alone. After that decision, I did not reach out to anyone else, in academia or industry, to validate that name. From the discussion in this thread, I must conclude that I failed to communicate to my colleagues at IBM, who have done all the work and constantly interact with the broader community, that the decision was purely mine. My colleagues had no ill intentions when they proposed the new routine, with the name I gave it, to the OpenBLAS community. They thought that they were proposing a name with broader industry, academia and community support.

I understand that people want to revisit that name. I think it is a great idea for the community to work together and agree on a set of names for (at least) the two main half-precision formats: IEEE binary16 and the format commonly known as bfloat16. The work would have broad impact, as there are more machines coming up with hardware support for reduced-precision floating-point formats.

Sincerely,

Jose Moreira

martin-frbg commented 4 years ago

Thank you very much for the detailed explanation. So I guess we are the provisional brainfloaters working group now. 🙂

martin-frbg commented 4 years ago

So are there any objections against changing the prefix to "sb" (and the build option to something like BUILD_BFLOAT I guess) so far ? Anybody else we should invite to comment on this matter ? I am currently recovering from dental surgery (previous message was actually sent from the chair, waiting for the local anesthetic to kick in) but would like to see this resolved before the next release if possible.

joseemoreira commented 4 years ago

I like the "sb" prefix for "bfloat16" input and "single-precision" output. It leaves open "sh" for IEEE 16-bit input.

Martin: Hope your dental surgery recover goes well. I had one of those several years ago. Not fun :-(

quickwritereader commented 4 years ago

@martin-frbg, have a speedy recovery!

martin-frbg commented 4 years ago

I just hope nobody has discovered the current "shgemm" for their project so that we can safely repurpose the name. And thanks a lot for your kind wishes, I do find that my childhood wish to live entirely on ice cream has not aged well :-/

Guobing-Chen commented 4 years ago

I just hope nobody has discovered the current "shgemm" for their project so that we can safely repurpose the name. And thanks a lot for your kind wishes, I do find that my childhood wish to live entirely on ice cream has not aged well :-/ But this still will not stop kids dreaming the same or similar in future even shown them the pain :).

As for the renaming, my previous PR #2771 for the changing can be a base for this though that I need to refactor a bit to echo latest code base. Is that way to reopen it? Or I can create a new one.

conradsnicta commented 4 years ago

So are there any objections against changing the prefix to "sb" (and the build option to something like BUILD_BFLOAT I guess) so far ? Anybody else we should invite to comment on this matter ?

The prefix "sb" is okay, though it might be useful to see the wider implications/extensions of this.

By default, each member of the gemm family stays within its precision (eg. single, double), so it would make sense to also have a "bgemm" function where the calculations stay within bf16. Then there are also complex versions, where the single letter limitation for prefixes is problematic.

The extended gemm family would look like so:

where:

Not sure if there are clear use cases for bx16 and cx16 yet, though this shouldn't be discounted.

martin-frbg commented 4 years ago

I foresee no problems with bgemm (if there is a need for this in AI), and I guess we could adopt v for complex b (again if this is to play any role in AI computations) which might make w a natural choice for its IEEE counterpart ?

joseemoreira commented 4 years ago

AI (in particular deep learning) is still a fast moving field. There are some academic papers that promote the use of complex matrices in deep learning and the techniques could find a place in production code in the future. I am not sure I have seen any uses of half-precision complex, but it is a good idea to plan for that and at least "reserve" a letter.

And yes, there are also experiments with both input and output being bfloat16, although all practical uses I know have fp32 as output.

IEEE 754 defines a total of 8 formats (5 binary and 3 decimal), not all of them for computing. (For example, IEEE fp16 is not really defined for computing.) So that would consume 16 letters. Take 2 more for bfloat16 and we could still add 4 other formats.

conradsnicta commented 4 years ago

I can see a few potential use cases for bf16 and bgemm in machine learning workloads other than neural networks / deep learning. Though this is probably restricted by the level of CPU/GPU support for bf16 computations in the near-term (eg. support for computations other than bf16*bf16 -> fp32, which is the current plat du jour). I haven't looked into the plans (if any) by Intel/AMD/Nvidia/ARM/IBM/etc for expanded handling of bf16.

@joseemoreira Can you comment on what IBM is planning in the near and medium terms for the supported set of bf16 arithmetic operations?

... we could adopt v for complex b (again if this is to play any role in AI computations) which might make w a natural choice for its IEEE counterpart ?

That looks workable, but it is starting to resemble a random letter salad :)

martin-frbg commented 4 years ago

That looks workable, but it is starting to resemble a random letter salad :)

I am open to alternatives, b/v does at least sound a bit similar (in english), so perhaps not worse than the established d/z pair. (b/h should make some sense to musicians but is obviously forbidden. We could look into using accented characters - should give the user a choice of acute or grave result ?)

Also intrigued by joseemoreira's comment that IEEE fp16 is "not really defined for computing", which would explain why I did not encounter it anywhere outside earlier graphics programming - from the complaints about the "sh" naming I would have thought there are actual use cases unbeknownst to me and not just semantics as a matter of principle ???

conradsnicta commented 4 years ago

Accented characters are probably a no go, as it would confuse too many people. I suspect a large proportion of users (if not the majority) wouldn't know how to type accented characters anyway. They certainly require more effort than "plain" Latin characters.

The sb prefix is a workable solution that stays within the current naming tradition. It's entirely possible that's one of the main use cases (bf16*bf16 -> fp32) for the foreseeable future, under the assumption of neural network applications.

However, if going beyond the scope of this, a more informative prefix (or suffix) would be helpful, along the lines of b16_b16_gemm, f16_f16_gemm, f32_b16_gemm, f32_f32_gemm, c32_c32_gemm, etc. This type of approach would be far less confusing than single/double letter prefixes to handle a large number of possible permutations.

... IEEE fp16 is "not really defined for computing", which would explain why I did not encounter it anywhere outside earlier graphics programming

I think the main reason for this perception is that there is (was?) no proper hardware support for fp16 on CPUs. I have used fp16, but was forced to use a GPU because the CPU required conversion of fp16 to fp32 before doing any calculations (thereby providing no speed advantage).

Related:

Guobing-Chen commented 4 years ago

Accented characters are probably a no go, as it would confuse too many people. I suspect a large proportion of users (if not the majority) wouldn't know how to type accented characters anyway. They certainly require more effort than "plain" Latin characters.

The sb prefix is a workable solution that stays within the current naming tradition. It's entirely possible that's one of the main use cases (bf16*bf16 -> fp32) for the foreseeable future, under the assumption of neural network applications.

+1 for that.

However, if going beyond the scope of this, a more informative prefix (or suffix) would be helpful, along the lines of b16_b16_gemm, f16_f16_gemm, f32_b16_gemm, f32_f32_gemm, c32_c32_gemm, etc. This type of approach would be far less confusing than single/double letter prefixes to handle a large number of possible permutations.

The ICL guys are working on the next gen BLAS API standard, which exactly points to non-single-letter prefix, you can refer to @mgates3 's comments in this thread.

joseemoreira commented 4 years ago

Regarding the question by Conrad on IBM support for bf16: IBM Power ISA 3.1 has been published. That is what POWER10 will support. The bf16 instructions include the new MMA (Matrix-Multiply Assist) instructions that do a 4x2x4 matrix multiply, using a 512-bit accumulator and two 128-bit vector inputs, and conversion instructions to/from IEEE fp32. We have similar instructions for IEEE fp16. (Therefore, both sbgemm and shgemm could be supported in POWER10.)

I am not as familiar with IBM z Architecture. Don't know what they have in that area.

It is really too early to know what we will do in processors beyond POWER10. People have asked me for more comprehensive support, but that is ongoing work.

joseemoreira commented 3 years ago

Hello. I have not seem much discussion here recently.

Do we have an acceptable solution to use "h" and "b" as the single-letter prefix for IEEE binary16 and bfloat16 data types, respectively? And that they would be used within the "sh" and "sb" two-letter prefix for the proposed mixed-precision operations in BLAS?

If those are OK, then I would like to bring up the issue of operations on integer matrices. Those have become popular in deep learning and are already supported in IBM and Intel processors. (More support in POWER10.) Given the variety of integer formats, and the additional option of signed and unsigned integers, this seems to me a good time to break with the traditional approach and move on to the BLAS G2 suffix-based approach. To get started, here are some suggested suffixes:

s64 : signed 64-bit integer s32 : signed 32-bit integer s16 : signed 16-bit integer s8 : signed 8-bit integer

u64 : unsigned 64-bit integer u32 : unsigned 32-bit integer u16 : unsigned 16-bit integer u8 : unsigned 8-bit integer

Like I said, these are just some suggestions to get the discussion started. (But a subset of the above is already in use in some libraries.) I have seen other suffix proposals for integers, including from the ICL team.