API for a bitset data type

fortran-lang / stdlib

Fortran Standard Library

https://stdlib.fortran-lang.org

MIT License

1.1k stars 170 forks source link

API for a bitset data type #221

Closed wclodius2 closed 2 years ago

wclodius2 commented 4 years ago

Both C++ and Java define a bitset type, so I propose that the standard library also have a module implementing a bitset type. My first draft for a public API for that module is as follows.

The module name is STDLIB_BITSETS.

The module exports one constant, BITS_KIND, to be used as a KIND value in addressing bits in the bitset. This will initially be INT32.

The module exports one derived type, BITSETS, with a plural name so that users can name their scalar realizations as BITSET. The type will contain two private components: BITS a scalar integer of kind BITS_KIND giving the number of bits in a bitset entity, and a rank one allocatable integer array BLOCK of private kind BLOCK_KIND, that will be INT32, to be used in holding the bit values. The type has an ASSIGNMENT(=) operation defined.

BITSETS will have the following procedures: ALL, AND, AND_NOT, ANY_BIT, BIT_COUNT, BITS, CLEAR, EOR, EQUIV, EXTRACT, FLIP, INIT, INPUT, MAX_SET_BIT, NONE, OR, OUTPUT, PRINT_BITSET, READ_BITSET, SET, TEST, and VALUE. Some of these procedures will be overloaded:

interface clear
    module procedure clear_bit, clear_range
end interface clear

interface flip
    module procedure flip_bit, flip_range
end interface flip

interface init
    module procedure init_copy, init_zero
end interface init

interface set
    module procedure set_bit, set_range
end interface set

interface print_bitset
    module procedure print_bitset_string, print_bitset_unit
end interface print_bitset

interface read_bitset
    module procedure read_bitset_string, read_bitset_unit
end interface read_bitset

The API for the individual procedures is listed below. The following aspects of the API may be controversial:

How the range is defined for CLEAR_RANGE, FLIP_RANGE, and SET_RANGE. I have roughly followed Java's bitset here.
That the first argument is modified in AND, AND_NOT, EOR, and OR. That is how it is done in the Java bitset, but I can see having a function that returns the result rather than a subroutine that modifies the first argument.
That the I/O procedures INPUT, OUTPUT, PRINT_BITSET, and READ_BITSET have STATUS flags.
That the procedures often ignore user specification of positions outside the range 0 to BITS-1. An alternative is to make the procedures impure and invoking ERROR STOP. Another option is to add a STATUS argument, but I am reluctant to do that for a simple user error.
The names of some of the procedures could be more similar to the names of Fortran's bit intrinsics.

subroutine init_copy(set, aset, bits) : Creates the bitset SET, of size BITS if present, otherwise of the size of ASET. All bits in the range of ASET are copied from ASET. If BITS is present and larger than the size of ASET then all additional bits are zero.

subroutine init_zero(set, bits) : Creates the bitset, SET, of size BITS, with all bits initialized to zero. BITS must be non-negative.

elemental function all_bits( set ) : Returns .TRUE. if all bits in SET are 1, .FALSE. otherwise.

elemental subroutine and(set1, set2) : Sets the bits in SET1 to the bitwise AND of the original bits in SET1 and SET2. If SET1 has fewer bits than SET2 then the additional bits in SET2 are ignored. If SET1 has more bits than SET2, then the absent SET2 bits are treated as if present with zero value.

elemental subroutine and_not(set1, set2) : Sets the bits in SET1 to the bitwise and of the original bits in SET1 with the bitwise negation of SET2. If SET1 has fewer bits than SET2 then the additional bits in SET2 are ignored. If SET1 has more bits, then the absent SET2 bits are treated as if present with zero value.

elemental function any_bit(set) : Returns .TRUE. if any bit in SET is 1, .FALSE. otherwise

elemental function bit_count(set) : Returns the number of non-zero bits in SET.

elemental function bits(set) : Returns the number of bit positions in SET.

elemental subroutine clear_bit(set, pos) : Sets to zero the POS position in SET. If POS is less than zero or greater than BITS(SET)-1 it is ignored.

pure subroutine clear_range(set, start_pos, stop_pos) : Sets to zero all bits from the START_POS to STOP_POS positions in SET. If STOP_POS < START_POS then no bits are modified. Positions outside the range 0 to BITS(SET)-1 are ignored.

elemental subroutine eor(set1, set2) : Sets the bits in SET1 to the bitwise EOR of the original bits in SET1 and SET2. If SET1 has fewer bits than SET2 then the additional bits in SET2 are ignored. If SET1 has more bits than SET2, then the absent SET2 bits are treated as if present with zero value.

elemental function equiv(set1, set2) : Returns .TRUE. if all bits in SET1 and SET2 have the same value, .FALSE. otherwise. If the sets differ in size a value true will be returned if and only if the sets are equivalent in the overlapping range, and all bits outside the overlapping range are zero.

pure function extract(set, start_pos, stop_pos) : Creates a new bitset from a range, START_POS to STOP_POS, in bitset SET.

elemental subroutine flip_bit(set, pos) : Flips the value at the POS position in SET, provided the position is valid. If POS is less than 0 or greater than BITS(SET)-1, then no value is changed.

pure subroutine flip_range(set, start_pos, stop_pos) : Flips all valid bits from the START_POS to STOP_POS positions in SET. If STOP_POS < START_POS no bits are flipped. Positions less than 0 or greater than BITS(SET)-1 are ignored.

subroutine input(unit, set, status) : Reads the components of the bitset, SET, from the logical unit, UNIT, assuming that the components were written using OUTPUT.

elemental function max_set_bit( set ) : Returns the maximum position with a set bit. If no bit is set returns -1.

elemental function none(set) : Returns .TRUE. if none of the bits in SET have the value 1.

elemental subroutine or(set1, set2) : Sets the bits in SET1 to the bitwise OR of the original bits in SET1 and SET2. If SET1 has fewer bits than SET2 then the additional bits in SET2 are ignored. If SET1 has more bits than SET2, then the absent SET2 bits are treated as if present with zero value.

subroutine output(unit, set, status) : Writes the components of the bitset, SET, to the logical unit, UNIT, in a unformatted sequence compatible with INPUT.

subroutine print_bitset_string(string, set) : Writes a BITSETS literal to the allocatable default character STRING, representing the individual bit values in the bitsets, SET.

subroutine print_bitset_unit(unit, set, status, advance) : Writes a bitsets literal to the logical unit, UNIT, representing the individual bit values in the bitsets, SET. If STATUS is not present and an error occurs then processing stops with an error message. If STATUS is present then it has the error code SUCCESS if no error occurs, has the value ALLOC_FAULT if failure is due to the allocation of a temporary and, has the value WRITE_FAULT if an error occurs in the write to the unit. By default or if ADVANCE is present with the value 'YES', advancing output is used. If ADVANCE is present with the value 'NO', then the current record is not advanced by the write.

subroutine read_bitset_string(string, set, status) : Uses the bitsets literal in the default character STRING, to define the bitset, SET. The literal may be preceded by an an arbitrary sequence of blank characters. If STATUS is not present then an error results in the sending an error message to ERROR_UNIT and the termination of the program. If STATUS is present then it has the error code SUCCESS if no error occurs, the value INVALID_STRING if the sequence of characters after an initial sequence of blanks is not a BITSETS literal, the value INVALID_ARRAY_SIZE if the literal's bit size is too large to be represented by the bit size integer kind, the value ALLOC_FAULT if allocation of SET failed for the specified BITSIZE, or INVALID_INTEGER if the HEX literal constant is too large to be represented by a bit size binary integer. If STATUS is present with the value SUCCESS then SET is defined, otherwise it is not defined.

subroutine read_bitset_unit(unit, set, status) : Uses the bitsets literal at the current position in the formatted file with logical unit, UNIT, to define the bitset, SET. The literal may be preceded by an an arbitrary sequence of blank characters. If STATUS is not present then an error results in the sending an error message to ERROR_UNIT and the termination of the program. If STATUS is present then it has the error code SUCCESS if no error occurs, the value INVALID_STRING if the sequence of characters after an initial sequence of blanks is not a BITSETS literal, the value INVALID_ARRAY_SIZE if the literal's bitsize is too large to be represented by the bitsize integer kind, the value ALLOC_FAULT if allocation of SET failed for the specified bitsize, or INVALID_INTEGER if the HEX literal constant is too large to be represented by a bitsize binary integer. If STATUS is present with the value SUCCESS then SET is defined, otherwise it is not defined.

elemental subroutine set_bit(set, pos) : Sets the value at the POS position in SET, provided the position is valid. If the position is less than 0 or greater than BITS(SET)-1 then SET is unchanged.

pure subroutine set_range(set, start_pos, stop_pos) : Sets all valid bits to 1 from the START_POS to the STOP_POS positions in SET. If STOP_POS < START_POS no bits are changed. Positions outside the range 0 to BITS(SET)-1 are ignored.

elemental function test(set, pos) : Returns .TRUE. if the POS position is set, .FALSE. otherwise. If POS is negative or greater than BITS(SET) - 1 the result is .FALSE..

elemental function value(set, pos) : Returns 1 if the POS position is set, 0 otherwise. If POS is negative or greater than BITS(SET) - 1 the result is 0.

jvdp1 commented 4 years ago

Thank you. Interesting proposition. I work quite often at the bit-level to spare memory (I must store large tables with values only equal to 0,1,2, or 3). Therefore, I wrote something similar but not as complete.

Could it be that some procedures are available outside the DT? E.g. The olderFortran Standard does not provide a pop_count intrinsic function. Therefore, such a function outside the DT could be useful.

The module exports one constant, BITS_KIND, to be used as a KIND value in addressing bits in the bitset. This will initially be INT32.

I guess that it could be easily extended to other kinds with fypp.

wclodius2 commented 4 years ago

Maybe I am misunderstanding what you are saying, but I think you are misunderstanding what I am saying. I think that you think that BITS_KIND is the kind of integer used to store the bits when it is the kink of integer used to address the bits. You would want to change BITS_KIND only in two different circumstances:

You will want to address more than about 2**31 bits in one entity, in which case you will want fo make it an INT64.
You will only want to address about 32 bits, but want to address that number in a large number entities so the memory taken up by the BITS member is significant. However the memory taken up by the array descriptor for the BLOCK member of the type is even more significant. (Probably several INT64s.) In that case you want to go for a different type structure, with the BLOCK array a small fixed size array, with at least as many bits as the largest number of bits you want to address. Probably an array of INT8, with BITS also an INT8 to further minimize the memory footprint.

On Jul 18, 2020, at 1:42 AM, Jeremie Vandenplas notifications@github.com wrote:

Thank you. Interesting proposition. I work quite often at the bit-level to spare memory (I must store large tables with values only equal to 0,1,2, or 3). Therefore, I wrote something similar but not as complete.

Could it be that some procedures are available outside the DT? E.g. The Fortran Standard does not provide a pop_count intrinsic function. Therefore, such a function outside the DT could be useful.

The module exports one constant, BITS_KIND, to be used as a KIND value in addressing bits in the bitset. This will initially be INT32.

I guess that it could be easily extended to other kinds with fypp.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-660444561, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOSW4F4XIBWR76NOBETR4FG47ANCNFSM4O7IG3CQ.

jvdp1 commented 4 years ago

Maybe I am misunderstanding what you are saying, but I think you are misunderstanding what I am saying.

Indeed, I misundertood you. Sorry and thank you for the explanations.

On overall I am fine with the API.

wclodius2 commented 4 years ago

Thanks! Now if only more people showed an interest in the topic. I can’t see making this part of the standard library with this minimal show of interest.

On Jul 21, 2020, at 2:03 PM, Jeremie Vandenplas notifications@github.com wrote:

Maybe I am misunderstanding what you are saying, but I think you are misunderstanding what I am saying.

Indeed, I misundertood you. Sorry and thank you for the explanations.

On overall I am fine with the API.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-662077459, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOVXFEYVD2YI5LQJE5TR4XX73ANCNFSM4O7IG3CQ.

Romendakil commented 4 years ago

Personally I never saw the need for a bitmaps type, however, it was an ever and ever recurring topic on c.l.f., so I am pretty sure it would be used if available. So you clearly have my endorsement. But I am pretty new here and just started to read the messages since 2 weeks or so. Your proposal for the API seems very good to me.

milancurcic commented 4 years ago

I'm not the target audience for this, but I don't object to it being part of stdlib. I think it's in scope and the API is clean and clear.

@certik @aradi @arjenmarkus @MarDiehl do you mind offering your feedback?

arjenmarkus commented 4 years ago

Sure, I will have a look :).

Op do 23 jul. 2020 om 20:26 schreef Milan Curcic notifications@github.com:

I'm not the target audience for this, but I don't object to it being part of stdlib. I think it's in scope and the API is clean and clear.

@certik https://github.com/certik @aradi https://github.com/aradi @arjenmarkus https://github.com/arjenmarkus @MarDiehl https://github.com/MarDiehl do you mind offering your feedback?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-663162341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR7SSTTR5IV3BBUFSYLR5B6HDANCNFSM4O7IG3CQ .

arjenmarkus commented 4 years ago

I have looked at the proposal and while I don't use bitsets myself, I can see that there are applications for them in Fortran. The specifications seem complete, but perhaps a bit overcomplete. Here are my detailed remarks:

In the introduction you mention ALL, but later it is called ALL_BITS
I think a name like ANY_BITS is more consistent
Is a function like NONE really needed? You can get the same effect with .NOT. ANY
Exclusive OR appears as .XOR., not as .EOR. - perhaps use "XOR"?
Equivalent appears as .EQV. - I therefore suggest EQV
There is a max_set_bit function, but not a first_set_bit function - does that not have any use?
The description of the formatted input/output routines does not include the actual format for the bitsets. Only a vage mention is made of HEX. Is it really useful to only support hexadecimal input/output? Why not a sequence of 0 's and 1's? It would be much easier to specify in my opinion.
The elemental routines AND, AND_NOT etc. require some further explanation: if SET2 is smaller than SET1, then some convention must be used to deal with the extra bits in SET1. Am I correct in assuming that the design is such that SET2 is (at least conceptually) extended with 0 bits? An alternative could be to extend it (again conceptually) in such a way that the extra bits in SET1 are NOT changed.

Just a few remarks and suggestions :).

wclodius2 commented 4 years ago

FWIW I suspect I should follow the names for Fortran’s bit intrinsics, but I don’t like the leading I. Fortran has ALL and ANY but they are not bit intrinsics. So maybe ALL_BITS, and ANY_BITS are better. I agree NONE is not needed. Fortran has IEOR for the original bit intrinsics, but the atomic ones are XOR, so maybe XOR is best. I’ll go along with EQV. I am wondering if MAX_SET_BIT has any use. The problem with using just 0s and 1s is that the resulting string is an order of magnitude larger. FWIW I thought of the syntax for the bitset literal content as bitset-literal-constant is bitsize-literal-constant hex-literal-constant where bitsize-literal-constant is “S'” digit [digit…] and hex-literal-constant is “Z’” hex-digit [hex-digit…]

but a binary literal constant is much easier to implement and more legible for small bit set sizes.

For different sized sets, there are two cases of procedures. For the comparison functions, e.g., EQV, the shorter bitset is treated as if padded with zeros. For thee routines that modify the first argument, if the second argument is larger the trailing bits are ignored, while if it is shorter it is treated as if padded with zeros.

On Jul 23, 2020, at 2:21 PM, Arjen Markus notifications@github.com wrote:

I have looked at the proposal and while I don't use bitsets myself, I can see that there are applications for them in Fortran. The specifications seem complete, but perhaps a bit overcomplete. Here are my detailed remarks:

In the introduction you mention ALL, but later it is called ALL_BITS I think a name like ANY_BITS is more consistent Is a function like NONE really needed? You can get the same effect with .NOT. ANY Exclusive OR appears as .XOR., not as .EOR. - perhaps use "XOR"? Equivalent appears as .EQV. - I therefore suggest EQV There is a max_set_bit function, but not a first_set_bit function - does that not have any use? The description of the formatted input/output routines does not include the actual format for the bitsets. Only a vage mention is made of HEX. Is it really useful to only support hexadecimal input/output? Why not a sequence of 0 's and 1's? It would be much easier to specify in my opinion. The elemental routines AND, AND_NOT etc. require some further explanation: if SET2 is smaller than SET1, then some convention must be used to deal with the extra bits in SET1. Am I correct in assuming that the design is such that SET2 is (at least conceptually) extended with 0 bits? An alternative could be to extend it (again conceptually) in such a way that the extra bits in SET1 are NOT changed. Just a few remarks and suggestions :).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-663212935, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOV5ACEEUU4HEACYLIDR5CLULANCNFSM4O7IG3CQ.

arjenmarkus commented 4 years ago

I think it is a good idea to stay close to the names of the Fortran intrinsics, indeed.

The reason I asked about the hexadecimal format is that I would find it much easier to specify or interpret a particular bit pattern using zeroes and ones than using hexadecimal notation. But I agree that hexadecimal is much shorter :). Supporting both seems to me the best way.

For the routines that deal with two sets, having a consistent philosophy - and documenting it :) - is important. It will be much easier to reason about. It reminds me of an explanation I found in a book on (mainly) C by Kernighan and Pike (IIRC, I did not try and look it up): they designed functions like memset() with assignments in mind - the first argument would get changed, as if you read an assignment from left to right. They complained though that many people did not figure this out.

Regards,

Arjen

Op vr 24 jul. 2020 om 01:29 schreef William B. Clodius < notifications@github.com>:

FWIW I suspect I should follow the names for Fortran’s bit intrinsics, but I don’t like the leading I. Fortran has ALL and ANY but they are not bit intrinsics. So maybe ALL_BITS, and ANY_BITS are better. I agree NONE is not needed. Fortran has IEOR for the original bit intrinsics, but the atomic ones are XOR, so maybe XOR is best. I’ll go along with EQV. I am wondering if MAX_SET_BIT has any use. The problem with using just 0s and 1s is that the resulting string is an order of magnitude larger. FWIW I thought of the syntax for the bitset literal content as bitset-literal-constant is bitsize-literal-constant hex-literal-constant where bitsize-literal-constant is “S'” digit [digit…] and hex-literal-constant is “Z’” hex-digit [hex-digit…]

but a binary literal constant is much easier to implement and more legible for small bit set sizes.

For different sized sets, there are two cases of procedures. For the comparison functions, e.g., EQV, the shorter bitset is treated as if padded with zeros. For thee routines that modify the first argument, if the second argument is larger the trailing bits are ignored, while if it is shorter it is treated as if padded with zeros.

On Jul 23, 2020, at 2:21 PM, Arjen Markus notifications@github.com wrote:

I have looked at the proposal and while I don't use bitsets myself, I can see that there are applications for them in Fortran. The specifications seem complete, but perhaps a bit overcomplete. Here are my detailed remarks:

In the introduction you mention ALL, but later it is called ALL_BITS I think a name like ANY_BITS is more consistent Is a function like NONE really needed? You can get the same effect with .NOT. ANY Exclusive OR appears as .XOR., not as .EOR. - perhaps use "XOR"? Equivalent appears as .EQV. - I therefore suggest EQV There is a max_set_bit function, but not a first_set_bit function - does that not have any use? The description of the formatted input/output routines does not include the actual format for the bitsets. Only a vage mention is made of HEX. Is it really useful to only support hexadecimal input/output? Why not a sequence of 0 's and 1's? It would be much easier to specify in my opinion. The elemental routines AND, AND_NOT etc. require some further explanation: if SET2 is smaller than SET1, then some convention must be used to deal with the extra bits in SET1. Am I correct in assuming that the design is such that SET2 is (at least conceptually) extended with 0 bits? An alternative could be to extend it (again conceptually) in such a way that the extra bits in SET1 are NOT changed. Just a few remarks and suggestions :).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/fortran-lang/stdlib/issues/221#issuecomment-663212935>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/APTQDOV5ACEEUU4HEACYLIDR5CLULANCNFSM4O7IG3CQ .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-663281342, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR4OQPFBQJDAGGVQ7ZDR5DBVLANCNFSM4O7IG3CQ .

wclodius2 commented 4 years ago

I now think it is a bad idea to have two formats for the BITSETS literal representation:

It will be difficult to come up with two unique terms to describe two closely related concepts.
Even with distinguishing names users will sometimes confuse them and feed the output of one write format to the input of the other read format
I don’t want to write and support four extra subroutines.

I am really inclined to make the BITSETS literal a string of zeros and ones. Its intuitive and if people want to save memory they should use the binary representation.

On Jul 24, 2020, at 4:20 AM, Arjen Markus notifications@github.com wrote:

I think it is a good idea to stay close to the names of the Fortran intrinsics, indeed.

The reason I asked about the hexadecimal format is that I would find it much easier to specify or interpret a particular bit pattern using zeroes and ones than using hexadecimal notation. But I agree that hexadecimal is much shorter :). Supporting both seems to me the best way.

For the routines that deal with two sets, having a consistent philosophy - and documenting it :) - is important. It will be much easier to reason about. It reminds me of an explanation I found in a book on (mainly) C by Kernighan and Pike (IIRC, I did not try and look it up): they designed functions like memset() with assignments in mind - the first argument would get changed, as if you read an assignment from left to right. They complained though that many people did not figure this out.

Regards,

Arjen
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub , or unsubscribe .

MarDiehl commented 4 years ago

I am really inclined to make the BITSETS literal a string of zeros and ones. Its intuitive and if people want to save memory they should use the binary representation.

I have never used BITSETS, but using a string of zeros and ones seems odd to me. Why not use an array (of variable size) of logicals/booleans?

wclodius2 commented 4 years ago

Note I was writing about a BITSETS literal, a character string representation of the underlying values. A bitset is normally implemented as an array of large integers, with bit intrinsics used to read and manipulate them. The bit operations are used to save memory and reduce memory traffic, so the operations may be faster than say operations on byte sized logicals.

On Jul 24, 2020, at 12:45 PM, Martin Diehl notifications@github.com wrote:

I am really inclined to make the BITSETS literal a string of zeros and ones. Its intuitive and if people want to save memory they should use the binary representation. I have never used BITSETS, but using a string of zeros and ones seems odd to me. Why not use an array (of variable size) of logicals/booleans?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-663680391, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOW32I67ON7HFNUVB33R5HJFHANCNFSM4O7IG3CQ.

ivan-pi commented 4 years ago

Speaking of BITSET literals, I don't see a reason why arrays of integers could not be used:

use stdlib_bitsets, only: bitsets, assignment(=)
type(bitsets) :: b

b = [1,0,1,0,0,0,0,0] ! (overloaded assignment for assumed size arrays)
b = "10100000"        ! (overloaded assignment for character literals)

But personally, I find using a string more attractive. Using logical literals .true./.false. is too verbose.

In the potential use case, where you would need to convert an array of zeros and ones (or a logical mask) to a string, you could probably do so using something along the lines of:

character(len=:), allocatable :: bstr
integer, allocatable :: ib(:), dc(:)
integer :: i
dc = [(i,i=1,20)]
allocate(ib,mold=dc)
where (mod(dc,2)==0)
  ib = 1
else where
  ib = 0
end where

allocate(character(len=size(ib)) :: bstr)
write(bstr,'(*(I1))') ib

write(*,'(*(I1))') dc
write(*,'(*(I1))') ib
write(*,'(A)'), bstr

which produces the output:

12345678
01010101
01010101

I haven't looked up the details on boz constants, so I don't know if this could be perhaps also allowed:

b = b'10100000'
b = z'A0'

wclodius2 commented 4 years ago

Ivan:

Maybe I am misunderstanding what you wrote, but I think you are over interpreting what I wrote. Bitset literals are primarily for writing out the values for human readers, and not fro representing bit sets internally. For writing out for human readers arrays of integers are not useful. An array of zeros and ones are certainly memory wasteful for an internal representation

On Jul 25, 2020, at 5:47 AM, Ivan notifications@github.com wrote:

Speaking of BITSET literals, I don't see a reason why arrays of integers could not be used:

use stdlib_bitsets, only: bitsets, assignment(=) type(bitsets) :: b

b = [1,0,1,0,0,0,0,0] ! (overloaded assignment for assumed size arrays) b = "10100000" ! (overloaded assignment for character literals) But personally, I find using a string more attractive. Using logical literals .true./.false. is too verbose.

In the potential use case, where you would need to convert an array of zeros and ones (or a logical mask) to a string, you could probably do so using something along the lines of:

character(len=:), allocatable :: bstr integer, allocatable :: ib(:), dc(:) integer :: i dc = [(i,i=1,20)] allocate(ib,mold=dc) where (mod(dc,2)==0) ib = 1 else where ib = 0 end where

allocate(character(len=size(ib)) :: bstr) write(bstr,'(*(I1))') ib

write(,'((I1))') dc write(,'((I1))') ib write(*,'(A)'), bstr which produces the output:

12345678 01010101 01010101 I haven't looked up the details on boz constants, so I don't know if this could be perhaps also allowed:

b = b'10100000' b = z'A0' — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-663846268, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOS3G7SFKFJPZFR2G3LR5LA4BANCNFSM4O7IG3CQ.

ivan-pi commented 4 years ago

My response was partially referring to the comment by @MarDiehl. I understand that the internal representation is different.

My understanding of literals though was:

A literal is a source code representation of a fixed value. They are represented directly in the code without any computation.

With the currently proposed API, how do I initialize a bitset? Like this?

type(bitsets) :: bitset
integer :: stat
call read_bitset_string("1111111",bitset,stat)

wclodius2 commented 4 years ago

There would be several ways to initialize a bitset. I was thinking a bitset literal would have the syntax ‘S7B1111111’, i.e., having the size as a prefix, as otherwise reading a literal out of a file could be ambiguous. I was going to avoid STATUS arguments. What would a user do if told their bitset literal was invalid, other than stop the code and rewrite it? I figure it is best to stop the code immediately with an informative message rather than continue.

As to initialization there would the following options

call read_bistset(“S7B1111111”, bitset) ! to read from a string call read_bitset(lun, bitset) ! to read from a formatted file call input(sun, bitset) ! to read from an unformatted file call init(bitset, bits) ! to initialize as size bits with all zeros call init(bitset, abitset, bits) ! to initialize as size bits copying the overlapping portions of a bitset and of course bitset = abitset ! to make bitset initially a duplicate of a bitset

On Jul 26, 2020, at 4:17 AM, Ivan notifications@github.com wrote:

My response was partially referring to the comment by @MarDiehl https://github.com/MarDiehl. I understand that the internal representation is different.

My understanding of literals though was:

A literal is a source code representation of a fixed value. They are represented directly in the code without any computation.

With the currently proposed API, how do I initialize a bitset? Like this?

type(bitsets) :: bitset integer :: stat call read_bitset_string("1111111",bitset,stat) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-663969563, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOT2F5IGVGVLZKSPCZLR5P7CLANCNFSM4O7IG3CQ.

MarDiehl commented 4 years ago

@wclodius2: thanks for the clarification

I think with overloading = the following operations would be user friendly

type(bitset) :: b
b = '101'
b = [.True.,.False.,.True.]
b = [1,0,1]
b = 101

The skeleton code would then look like

module bitset_m                                                                                     

 implicit none                                                                                      

 type, public :: bitset                                                                             
 end type bitset                                                                                    

 interface assignment (=)                                                                           
   module procedure assign_str                                                                      
   module procedure assign_int                                                                      
   module procedure assign_int_array                                                                
   module procedure assign_bool_array                                                               
 end interface assignment (=)                                                                       

contains                                                                                            

pure subroutine assign_str(self,str)                                                                
  type(bitset), intent(out)    :: self                                                              
  character(len=*), intent(in) :: str                                                               
end subroutine                                                                                      

pure subroutine assign_int(self,i)                                                                  
  type(bitset), intent(out) :: self                                                                 
  integer,      intent(in)  :: i                                                                    
end subroutine                                                                                      

pure subroutine assign_int_array(self,i)                                                            
  type(bitset),         intent(out) :: self                                                         
  integer, dimension(:),intent(in)  :: i                                                            
end subroutine                                                                                      

pure subroutine assign_bool_array(self,b)                                                           
  type(bitset),         intent(out) :: self                                                         
  logical, dimension(:),intent(in)  :: b                                                            
end subroutine                                                                                      

end module bitset_m

MarDiehl commented 4 years ago

I personally would prefer object oriented code:


type(bitset) :: b
call b%init(size=7) ! length of 7 bits
call b%from_formatted('file.txt') ! filename
call b%from_formatted(fh) ! filehandle

wclodius2 commented 4 years ago

The array of logicals as input seems reasonable to me, though I would have to do it for every logical kind, The others less so as I view them as error prone. What does b=101 mean?.Do I treat it as a 3 bit bitset of 1,0, and 1 or a 32 bit bitset whose bits are those of the underlying INTEGER(INT32). For a long character string how do I ensure that the user hasn’t accidentally dropped or added a character. That is why I am considering having the syntax for a bit string be S#B# here the first # is the length of the bitset, and the second # is a string of ‘0’s and ‘1’s. What do I do if the user enters O instead of 0? What if I am passed b=[1,0,2] or b=[1,0,-1]?

On Jul 26, 2020, at 1:08 PM, Martin Diehl notifications@github.com wrote:

@wclodius2 https://github.com/wclodius2: thanks for the clarification

I think with overloading = the following operations would be user friendly

type(bitset) :: b b = '101' b = [.True.,.False.,.True.] b = [1,0,1] b = 101 The skeleton code would then look like

module bitset_m

implicit none

type, public :: bitset
end type bitset

interface assignment (=)
module procedure assign_str
module procedure assign_int
module procedure assign_int_array
module procedure assign_bool_array
end interface assignment (=)

contains

pure subroutine assign_str(self,str)
type(bitset), intent(out) :: self
character(len=*), intent(in) :: str
end subroutine

pure subroutine assign_int(self,i)
type(bitset), intent(out) :: self
integer, intent(in) :: i
end subroutine

pure subroutine assign_int_array(self,i)
type(bitset), intent(out) :: self
integer, dimension(:),intent(in) :: i
end subroutine

pure subroutine assign_bool_array(self,b)
type(bitset), intent(out) :: self
logical, dimension(:),intent(in) :: b
end subroutine

end module bitset_m — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-664028046, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOVIWM5J6YO6442LB4TR5R5KPANCNFSM4O7IG3CQ.

wclodius2 commented 4 years ago

It is easy to provide both.

On Jul 26, 2020, at 1:11 PM, Martin Diehl notifications@github.com wrote:

I personally would prefer object oriented code:

type(bitset) :: b call b%init(size=7) ! length of 7 bits call b%from_formatted('file.txt') ! filename call b%from_formatted(fh) ! filehandle — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-664028373, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOUQ4GPTGWDBV5SRKTLR5R5VXANCNFSM4O7IG3CQ.

ivan-pi commented 4 years ago

What does b=101 mean?.Do I treat it as a 3 bit bitset of 1,0, and 1 or a 32 bit bitset whose bits are those of the underlying INTEGER(INT32). For a long character string how do I ensure that the user hasn’t accidentally dropped or added a character. That is why I am considering having the syntax for a bit string be S#B# here the first # is the length of the bitset, and the second # is a string of ‘0’s and ‘1’s. What do I do if the user enters O instead of 0? What if I am passed b=[1,0,2] or b=[1,0,-1]?

In case of b = 101 it would have to be the underlying integer(int32) representation. This would allow usage of binary, octal, and hexadecimal boz constants. Upon more thought, I would avoid initialization using integer arrays altogether.

I see many similarities in this discussion to the C++ std::bitset. They allow initialization from either integers or character strings (zeros and ones, or a custom pair of characters). An exception is raised for invalid arguments in the constructor. Interestingly, they do not overload the assignment operator (apart from creating copies).

Seeing that the C++ bitset uses a template for the size, perhaps a solution using parametrized derived types would be also of interest.

  integer, parameter :: bits_kind = int32
  integer, parameter :: block_kind = int32

  type :: bitsets(b)
    integer, len :: b
    integer(bits_kind) :: bits = b
    integer(block_kind) :: block(b/storage_size(block_kind)+1)
  end type

The downsides are that since block is not allocatable anymore, enlarging a bitset means declaring a second (larger) instance; and the compiler support might not be mature enough.

ivan-pi commented 4 years ago

I have found out the C++ ecosystem also has a dynamic bitset: https://www.boost.org/doc/libs/1_35_0/libs/dynamic_bitset/dynamic_bitset.html

ivan-pi commented 4 years ago

This would allow usage of binary, octal, and hexadecimal boz constants.

Upon reading a recent comp.lang.fortran discusssion, I came to realize that boz literals can only be used in data statements and intrinsic procedures, meaning these constants need to be passed through the conversion function int:

open(newunit=unit)
write(unit,*) int(b'10110101')
call init(set=bs,bits=6)
call read_bitset(unit=unit,set=bs) ! bs contains '110101'
close(unit)

wclodius2 commented 4 years ago

Right now I feel stymied on this issue for several reasons:

It is hard to write much of the code without a decision on how to handle and propagate errors. Issue #224 could use a wider participation so that a consensus on that issue could be reached.
People are suggesting changes in the API without commenting as to whether they would support the BITSET type with or without the changes.
Most of the suggested changes involve additional ways to initialize a BITSET object, that I feel should have low priority. Collectively they will add a burden to the API and testing that will provide only incremental functionality.
The level of participation in this issue is small, so that even if all the participants in this issue supported the API it is not clear that the BITSET type has sufficient support to be worth adding to the library.

wclodius2 commented 4 years ago

Some outside interest might help. One person that in this forum has expressed interest in a bitset type in the past, but has not commented on this issue in particular is @FortranFan.

On Jul 31, 2020, at 6:27 PM, septcolor notifications@github.com wrote:

@wclodius2 https://github.com/wclodius2 As for Point 4, shall I post a related question to comp.lang.fortran and ask opinions about BITSET? I remember Ron Shepard wished such bit string types, but not very sure about details...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-667437217, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOSDA3Z65L5XG2UKOBTR6NOPNANCNFSM4O7IG3CQ.

certik commented 4 years ago

@wclodius2 the reason why there is relatively low traffic here (although 8 participants is not bad!) is that I think most people (myself included) haven't needed this in Fortran. So I am fine to include it into stdlib if there is interest, but I will let others who actually use this to lead this particular effort.

Romendakil commented 4 years ago

Is it much effort to include a base construction? If not, why not include it and see whether there is feedback in order to decide to extend it?

wclodius2 commented 4 years ago

What do you mean by base constructor? I see no need for an inheritance tree for this type. There is the default structure constructor for this type, but with private components it is only available in the source module, and it is difficult to use consistently. Other than that I had planned to have assignment of other entities of the bitset type, assignment of arrays of logical type, initialization as all zeros, initialization opting from an existing bitset entity, but not necessarily to the same size as the source entity, and initialization from a character string.

On Aug 1, 2020, at 3:10 AM, Jürgen Reuter notifications@github.com wrote:

Is it much effort to include a base construction? If not, why not include it and see whether there is feedback in order to decide to extend it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-667500267, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOQU7QVZXTJTSYPHTFTR6PLXBANCNFSM4O7IG3CQ.

ivan-pi commented 4 years ago

Maybe I have misunderstood @Romendakil 's comment, but if by base construction he meant an abstract type, this would in principle allow us to have several different bitset implementations (fixed size, dynamic, fast...) all conforming to the same API.

I've attached a small example for the two functions clear and set_bit. It compiles with no warnings using the Intel Fortran compiler.

bitset_example.f90.txt

An example of using both a fixed size bitset and a dynamically allocated one:

program bitset_example

  use bitset_fixed
  use bitset_dynamic
  implicit none

  type(bitset_f(bits=64)) :: bsf ! fixed size bitset
  type(bitset_d) :: bsd          ! dynamic bitset

  integer :: i

  ! allocate 64 bits of storage, only temporary method
  allocate(bsd%blocks(2))

  do i = 0, 63
    call bsf%set_bit(i)
    call bsd%set_bit(i)
  end do

  do i = 0, 31, 2
    call bsf%clear(i)
    call bsd%clear(i)
  end do

  write(*,'(*(b32))') bsf%blocks(1), bsf%blocks(2)
  write(*,'(*(b32))') bsd%blocks(1), bsd%blocks(2)
end program

On my platform the output is:

1010101010101010101010101010101011111111111111111111111111111111
1010101010101010101010101010101011111111111111111111111111111111

longb commented 4 years ago

Just pointing out that a full-feature BITS intrinsic data type was proposed for F2008, but taken out at the last moment. The J3 working document 07-007r2.pdf has the BITS feature incorporated. Something like that would be significantly simpler, and better performing, than a library.

FortranFan commented 4 years ago

@wclodius2 wrote July 31, 2020 11:40 PM EDT:

Some outside interest might help. One person that in this forum has expressed interest in a bitset type in the past, but has not commented on this issue in particular is @FortranFan.

There are several reasons for this but none of them have to do with a lack of interest in the need for such facilities with Fortran whether it be as part of the base language itself or as part of a "standard library".

I personally think there is tremendous value in having a type to manage a compact array of bit values where the bits might be stored efficiently than a typical consumer can figure out and where the consumer doesn't have to do the bit-shifting on their own, rather convenient and expressive options are available. The last use I had was with library code that interfaces with multiphysics simulations of a complex process where the management, especially with the computation of certain state functions, their first and second derivatives as well as their quadrature in some instances, was a lot more convenient to achieve with such a type.

Too bad the Fortran language fell deficient on this count (and also others) and the code was implemented using modern C++, Microsoft's 'managed' C++, as well as C#, rather successfully I grudgingly admit,. The main point of this anecdote though is that the team's use case in that simulation library was met more than adequately using std:bitset in C++ as well as a similar `BitArray` class in C#. Thus a design toward an API for such a type in Fortran that looks to these C++ and C# options for guidance will be my recommendation. Besides, there is a certain minimalist approach along the lines of Pareto Principle or the so-called 80/20 rule one can see in the C++ stdlib and the C# Framework facilities that will be always to keep in mind for content in Fortran stdlib.

wclodius2 commented 4 years ago

After thinking about it I have decided to initially define an abstract base class with two realizations, a fixed size bitset that can hold up to 64 bits, and a variable sized bitset that can hold up to 2**31-1 bits. If there is demand afterwards I might also provide a fixed size that can hold up to 128 bits and one that can hold up to 256 bits. Do people have a preference as to whether the realizations should be in submodules of the module that contains the abstract base class, or in separate modules?

arjenmarkus commented 4 years ago

I would say that this is a good opportunity for one module and submodules. That will allow easy extension of the set of concrete types - having separate modules for each type feels clumsy, even though you can provide a single overall module that merely exposes all specific ones.

Op vr 7 aug. 2020 om 05:55 schreef William B. Clodius < notifications@github.com>:

After thinking about it I have decided to initially define an abstract base class with two realizations, a fixed size bitset that can hold up to 64 bits, and a variable sized bitset that can hold up to 2**31-1 bits. If there is demand afterwards I might also provide a fixed size that can hold up to 128 bits and one that can hold up to 256 bits. Do people have a preference as to whether the realizations should be in submodules of the module that contains the abstract base class, or in separate modules?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-670307425, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR337FZTESZT5H7LLX3R7N3KFANCNFSM4O7IG3CQ .

ivan-pi commented 4 years ago

I would also put the specific realizations in submodules.

Would the abstract type be public? I think this is necessary if users want to write procedures which can accept either fixed or variable-size bitsets (I don't know if there are any real use cases for this).

ivan-pi commented 4 years ago

Just pointing out that a full-feature BITS intrinsic data type was proposed for F2008, but taken out at the last moment. The J3 working document 07-007r2.pdf has the BITS feature incorporated. Something like that would be significantly simpler, and better performing, than a library.

Thanks @longb, I didn't know about this. Are there any other documents which summarize the features and reasoning of the proposed BITS intrinsic type on a simple level? I found the standard document a bit inconvenient for this purpose.

jvdp1 commented 4 years ago

I am also in favor of one module and multiple submodules.

Le ven. 7 août 2020 à 08:34, Arjen Markus notifications@github.com a écrit :

I would say that this is a good opportunity for one module and submodules. That will allow easy extension of the set of concrete types - having separate modules for each type feels clumsy, even though you can provide a single overall module that merely exposes all specific ones.

Op vr 7 aug. 2020 om 05:55 schreef William B. Clodius < notifications@github.com>:

After thinking about it I have decided to initially define an abstract base class with two realizations, a fixed size bitset that can hold up to 64 bits, and a variable sized bitset that can hold up to 2**31-1 bits. If there is demand afterwards I might also provide a fixed size that can hold up to 128 bits and one that can hold up to 256 bits. Do people have a preference as to whether the realizations should be in submodules of the module that contains the abstract base class, or in separate modules?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/fortran-lang/stdlib/issues/221#issuecomment-670307425>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAN6YR337FZTESZT5H7LLX3R7N3KFANCNFSM4O7IG3CQ

.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-670353012, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD5RO7GAJ5R3AMFPXKCOAF3R7OOARANCNFSM4O7IG3CQ .

wclodius2 commented 4 years ago

I now have an implementation of bitsets that I will start testing. The main module file, stdlib_bitsets.f90, defines an abstract type, bitset_t, with two descendants, bitset_64 which can handle bitsets up to 64 bits, and bitset_large, which can handle bitsets up to huge(0_bits_kind) bits, where bits_kind is currently int32. The implementations of bitset_64 and bitset_large are in the submodule files, stdlib_bitset_64.f90 and stdlib_bitset_large.f90. at this point I have two questions for potential users:

Are the names of the descendant types and their submodule files acceptable?
There are a number of binary operations: and, and_not, eqv, or, and xor. Users I expect will normally want to use these with bitsets of the same size, but the current code allows the two arguments to differ in size effectively padding the smaller argument with zero bits where necessary. This results in significantly more complicated code for eqv, or, and xor, with a corresponding impact on the code robustness, runtime, and testing. Should I continue to allow arguments differing in size, or require the bitsets to be the same size and enforce it with if tests and error stops, or require the arguments to be the same size and simply leave the results undefined if the arguments differ in size?

arjenmarkus commented 4 years ago

Hi Bill,

to answer your questions: ad 1. These names look quite acceptable to me. While making the numerical size (64) part of the name is not something I would recommend in general, I think in this case, as we are talking of a very specific data type, it is acceptable. (My hesitation wrt such numerical components has to do with such ancient types as REAL*4 and the unfortunate "kind=4" practice ;)) ad 2. I would say that the two are meant for very different purposes. If they should be combined, why not do that explicitly via a conversion function? That way, should the need arise for a third type, say, bitset_128, you avoid the combinatorial explosion and you make it clearer that the data types are related but not merely different tastes of the same thing. If they should be combined in one operation, then that should be done explicitly. (You could also define a conversion from bitset_large to bitset_64, though that will necessarily truncate the bts beyond 64 or 63 ...)

Regards,

Arjen

Op do 3 sep. 2020 om 22:31 schreef William B. Clodius < notifications@github.com>:

I now have an implementation of bitsets that I will start testing. The main module file, stdlib_bitsets.f90, defines an abstract type, bitset_t, with two descendants, bitset_64 which can handle bitsets up to 64 bits, and bitset_large, which can handle bitsets up to huge(0_bits_kind) bits, where bits_kind is currently int32. The implementations of bitset_64 and bitset_large are in the submodule files, stdlib_bitset_64.f90 and stdlib_bitset_large.f90. at this point I have two questions for potential users:

1.

Are the names of the descendant types and their submodule files acceptable? 2.

There are a number of binary operations: and, and_not, eqv, or, and xor. Users I expect will normally want to use these with bitsets of the same size, but the current code allows the two arguments to differ in size effectively padding the smaller argument with zero bits where necessary. This results in significantly more complicated code for eqv, or, and xor, with a corresponding impact on the code robustness, runtime, and testing. Should I continue to allow arguments differing in size, or require the bitsets to be the same size and enforce it with if tests and error stops, or require the arguments to be the same size and simply leave the results undefined if the arguments differ in size?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-686748448, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YRZ4C5BMOF2UID7UMYTSD74KDANCNFSM4O7IG3CQ .

ivan-pi commented 4 years ago

Regarding the abstract bitset_t, I think it's suitable for declaring polymorphic bitset variables such as:

class(bitset_t), allocatable :: foo ! could be bitset_64 or bitset_large

In my own codes I have someties prepended abstract, as in abstract_bitset_t, but here this would be unnecessarily verbose. Good documentation should lead users to the right type for their application.

The second usage of the abstract type name will occur in subroutine declaration blocks:

subroutine some_bitset_operation(b)
  class(bitset_t), intent(in) :: b
  select type(b)
    type is (bitset_64)
      ! do something
    type is (bitset_large)
      ! do something else
  end select
end function

Concerning bitset_64 I wonder if dropping the underscore would be better? This would make it similar to the constants from iso_fortran_env, i.e int64, real64 (I realize this comparison is not totally correct as these are type constants and not the actual types). I understand the underscore is there for consistency with bitset_large, however bitset64 seems easier to use in practice.

From the linguistic point of view, putting the adjective first - large_bitset, is more logical than bitset_large. In Boost they also put the adjective first (see dynamic_bitset). Perhaps a pair of adjective-derived names such as small_bitset for the one with 64 bits, and large_bitset for the variable one, would communicate the purpose of the derived types. Users can always rename upon import to use whatever they want, i.e. use stdlib_bitset, bitset64 => small_bitset. A problem however arises with this kind of naming if other fixed-size bitsets are introduced (with 32 or 128 bits).

ivan-pi commented 4 years ago

I also agree with the answer to the second question by @arjenmarkus. Since the two sub-types will likely appear in different usage cases, I would initially go for an explicit conversion function. If I have understood correctly, the bitset_large can have variable size? For this type I would still expect the binary operators to work irrespective of the number of bits held internally. I would however (for now) avoid mixed binary operators between bitset_64 and bitset_large.

wclodius2 commented 4 years ago

Hi Arjen:

I may be misunderstanding you, but I think you misunderstood my second question. Entities of both the bitset_64 and bitset_large can behave as though they have different numbers of bits. For example one can declare one entity of type bitset_64 as having 16 bits and another entity as having 17 bits. For two different bit sizes of type bitset_64 I have to decide and describe how they interact for such binary ops as and, and_not, or, xor, ==, /=, >,>=, <, and <=. The simplest description and implementation is to simply forbid using entities of different bit sizes in those binary ops. I then have to decide whether to enforce this restriction by a test and branch to an error stop, or leave the use of two bit sizes as undefined, eliminating the overhead of the test and branch., which can be significant for a bitset_64. If I decide to allow different bit sizes then it has the following consequences:

The obvious description of how to treat different bit sizes (pad set2 with zeros if shorter than set1, and ignore the extra bits if longer than set1) for the logical operations and, and_not, or, and xor is different from the obvious description (pad the smaller bitset with zeros) of how to treat the comparison operations ==, /=, >,>=, <, and <=.
I have to have at least one branch in the individual procedures for bits(set1)>= bits(set2) vs. bits(set1)<bits(set2)
In the final word of the bitset I often have to resort to bit level operations, rather than the word level operations I can consistently use if they have the same number of bits. This can have a large performance hit.
The logic for each branch is tricky and easy to get wrong, particularly for the bitset_large.
I have to triple the testing for those ops, in effect testing separately for bits(set1)<bits(set2), bits(set1)==bits(set2), and bits(set1)>bits(set2).

In C++ and Java, the problem doesn’t exist as the number of bits determines the type and the binary ops are forbidden to mix types, resulting in all binary ops requiring the same number of bits for each operand.

Given the complexity of mixing bit sizes, and to my mind, the black of usefulness for mixing bit sizes, I think there will be zero demand for the ability to mix bit sizes on the binary ops.

That being said a conversion function is useful and I’ll see about implementing one.

On Sep 7, 2020, at 2:44 AM, Arjen Markus notifications@github.com wrote:

Hi Bill,

to answer your questions: ad 1. These names look quite acceptable to me. While making the numerical size (64) part of the name is not something I would recommend in general, I think in this case, as we are talking of a very specific data type, it is acceptable. (My hesitation wrt such numerical components has to do with such ancient types as REAL*4 and the unfortunate "kind=4" practice ;)) ad 2. I would say that the two are meant for very different purposes. If they should be combined, why not do that explicitly via a conversion function? That way, should the need arise for a third type, say, bitset_128, you avoid the combinatorial explosion and you make it clearer that the data types are related but not merely different tastes of the same thing. If they should be combined in one operation, then that should be done explicitly. (You could also define a conversion from bitset_large to bitset_64, though that will necessarily truncate the bts beyond 64 or 63 ...)

Regards,

Arjen

Op do 3 sep. 2020 om 22:31 schreef William B. Clodius < notifications@github.com>:

I now have an implementation of bitsets that I will start testing. The main module file, stdlib_bitsets.f90, defines an abstract type, bitset_t, with two descendants, bitset_64 which can handle bitsets up to 64 bits, and bitset_large, which can handle bitsets up to huge(0_bits_kind) bits, where bits_kind is currently int32. The implementations of bitset_64 and bitset_large are in the submodule files, stdlib_bitset_64.f90 and stdlib_bitset_large.f90. at this point I have two questions for potential users:

1.

Are the names of the descendant types and their submodule files acceptable? 2.

There are a number of binary operations: and, and_not, eqv, or, and xor. Users I expect will normally want to use these with bitsets of the same size, but the current code allows the two arguments to differ in size effectively padding the smaller argument with zero bits where necessary. This results in significantly more complicated code for eqv, or, and xor, with a corresponding impact on the code robustness, runtime, and testing. Should I continue to allow arguments differing in size, or require the bitsets to be the same size and enforce it with if tests and error stops, or require the arguments to be the same size and simply leave the results undefined if the arguments differ in size?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-686748448, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YRZ4C5BMOF2UID7UMYTSD74KDANCNFSM4O7IG3CQ .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-688169151, or unsubscribe https://github.com/notifications/unsubscribe-auth/APTQDOQ36Y6JDMZGVIKWFJLSESMNZANCNFSM4O7IG3CQ.

arjenmarkus commented 4 years ago

Indeed, I misunderstood your question! If we could rely on parameterised derived types, would that be a solution? I have never used them, I must admit, given that support is rather precarious. I can imagine that a type with 11 bits (the number of bits being a len parameter of the type) would be considered differently than one with 12 bits. The len parameter does not need to be used for an actual length. Then something along these lines might work to distinguish the two types:

logical function equal( a, b ) type(bitset_64(nbits=*)) :: a type(bitset_64(nbits=a%nbits)) :: b end function equal

I have never tried anything of the sort, mind you.

Regards,

Arjen

Op ma 7 sep. 2020 om 19:46 schreef William B. Clodius < notifications@github.com>:

Hi Arjen:

I may be misunderstanding you, but I think you misunderstood my second question. Entities of both the bitset_64 and bitset_large can behave as though they have different numbers of bits. For example one can declare one entity of type bitset_64 as having 16 bits and another entity as having 17 bits. For two different bit sizes of type bitset_64 I have to decide and describe how they interact for such binary ops as and, and_not, or, xor, ==, /=, >,>=, <, and <=. The simplest description and implementation is to simply forbid using entities of different bit sizes in those binary ops. I then have to decide whether to enforce this restriction by a test and branch to an error stop, or leave the use of two bit sizes as undefined, eliminating the overhead of the test and branch., which can be significant for a bitset_64. If I decide to allow different bit sizes then it has the following consequences:

The obvious description of how to treat different bit sizes (pad set2 with zeros if shorter than set1, and ignore the extra bits if longer than set1) for the logical operations and, and_not, or, and xor is different from the obvious description (pad the smaller bitset with zeros) of how to treat the comparison operations ==, /=, >,>=, <, and <=.

I have to have at least one branch in the individual procedures for bits(set1)>= bits(set2) vs. bits(set1)<bits(set2)

In the final word of the bitset I often have to resort to bit level operations, rather than the word level operations I can consistently use if they have the same number of bits. This can have a large performance hit.

The logic for each branch is tricky and easy to get wrong, particularly for the bitset_large.

I have to triple the testing for those ops, in effect testing separately for bits(set1)<bits(set2), bits(set1)==bits(set2), and bits(set1)>bits(set2).

In C++ and Java, the problem doesn’t exist as the number of bits determines the type and the binary ops are forbidden to mix types, resulting in all binary ops requiring the same number of bits for each operand.

Given the complexity of mixing bit sizes, and to my mind, the black of usefulness for mixing bit sizes, I think there will be zero demand for the ability to mix bit sizes on the binary ops.

That being said a conversion function is useful and I’ll see about implementing one.

On Sep 7, 2020, at 2:44 AM, Arjen Markus notifications@github.com wrote:

Hi Bill,

to answer your questions: ad 1. These names look quite acceptable to me. While making the numerical size (64) part of the name is not something I would recommend in general, I think in this case, as we are talking of a very specific data type, it is acceptable. (My hesitation wrt such numerical components has to do with such ancient types as REAL*4 and the unfortunate "kind=4" practice ;)) ad 2. I would say that the two are meant for very different purposes. If they should be combined, why not do that explicitly via a conversion function? That way, should the need arise for a third type, say, bitset_128, you avoid the combinatorial explosion and you make it clearer that the data types are related but not merely different tastes of the same thing. If they should be combined in one operation, then that should be done explicitly. (You could also define a conversion from bitset_large to bitset_64, though that will necessarily truncate the bts beyond 64 or 63 ...)

Regards,

Arjen

Op do 3 sep. 2020 om 22:31 schreef William B. Clodius < notifications@github.com>:

I now have an implementation of bitsets that I will start testing. The main module file, stdlib_bitsets.f90, defines an abstract type, bitset_t, with two descendants, bitset_64 which can handle bitsets up to 64 bits, and bitset_large, which can handle bitsets up to huge(0_bits_kind) bits, where bits_kind is currently int32. The implementations of bitset_64 and bitset_large are in the submodule files, stdlib_bitset_64.f90 and stdlib_bitset_large.f90. at this point I have two questions for potential users:

1.

Are the names of the descendant types and their submodule files acceptable? 2.

There are a number of binary operations: and, and_not, eqv, or, and xor. Users I expect will normally want to use these with bitsets of the same size, but the current code allows the two arguments to differ in size effectively padding the smaller argument with zero bits where necessary. This results in significantly more complicated code for eqv, or, and xor, with a corresponding impact on the code robustness, runtime, and testing. Should I continue to allow arguments differing in size, or require the bitsets to be the same size and enforce it with if tests and error stops, or require the arguments to be the same size and simply leave the results undefined if the arguments differ in size?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/fortran-lang/stdlib/issues/221#issuecomment-686748448>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAN6YRZ4C5BMOF2UID7UMYTSD74KDANCNFSM4O7IG3CQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/fortran-lang/stdlib/issues/221#issuecomment-688169151>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/APTQDOQ36Y6JDMZGVIKWFJLSESMNZANCNFSM4O7IG3CQ .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-688454340, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR45OKFL43NYTPMC6FDSEUL77ANCNFSM4O7IG3CQ .

arjenmarkus commented 4 years ago

I just tried: both gfortran and Intel Fortran ignore the condition I tried to impose. So that is not an option!

Op di 8 sep. 2020 om 08:34 schreef Arjen Markus arjen.markus895@gmail.com:

Indeed, I misunderstood your question! If we could rely on parameterised derived types, would that be a solution? I have never used them, I must admit, given that support is rather precarious. I can imagine that a type with 11 bits (the number of bits being a len parameter of the type) would be considered differently than one with 12 bits. The len parameter does not need to be used for an actual length. Then something along these lines might work to distinguish the two types:

logical function equal( a, b ) type(bitset_64(nbits=*)) :: a type(bitset_64(nbits=a%nbits)) :: b end function equal

I have never tried anything of the sort, mind you.

Regards,

Arjen

Op ma 7 sep. 2020 om 19:46 schreef William B. Clodius < notifications@github.com>:

Hi Arjen:

I may be misunderstanding you, but I think you misunderstood my second question. Entities of both the bitset_64 and bitset_large can behave as though they have different numbers of bits. For example one can declare one entity of type bitset_64 as having 16 bits and another entity as having 17 bits. For two different bit sizes of type bitset_64 I have to decide and describe how they interact for such binary ops as and, and_not, or, xor, ==, /=, >,>=, <, and <=. The simplest description and implementation is to simply forbid using entities of different bit sizes in those binary ops. I then have to decide whether to enforce this restriction by a test and branch to an error stop, or leave the use of two bit sizes as undefined, eliminating the overhead of the test and branch., which can be significant for a bitset_64. If I decide to allow different bit sizes then it has the following consequences:

The obvious description of how to treat different bit sizes (pad set2 with zeros if shorter than set1, and ignore the extra bits if longer than set1) for the logical operations and, and_not, or, and xor is different from the obvious description (pad the smaller bitset with zeros) of how to treat the comparison operations ==, /=, >,>=, <, and <=.

I have to have at least one branch in the individual procedures for bits(set1)>= bits(set2) vs. bits(set1)<bits(set2)

In the final word of the bitset I often have to resort to bit level operations, rather than the word level operations I can consistently use if they have the same number of bits. This can have a large performance hit.

The logic for each branch is tricky and easy to get wrong, particularly for the bitset_large.

I have to triple the testing for those ops, in effect testing separately for bits(set1)<bits(set2), bits(set1)==bits(set2), and bits(set1)>bits(set2).

In C++ and Java, the problem doesn’t exist as the number of bits determines the type and the binary ops are forbidden to mix types, resulting in all binary ops requiring the same number of bits for each operand.

Given the complexity of mixing bit sizes, and to my mind, the black of usefulness for mixing bit sizes, I think there will be zero demand for the ability to mix bit sizes on the binary ops.

That being said a conversion function is useful and I’ll see about implementing one.

On Sep 7, 2020, at 2:44 AM, Arjen Markus notifications@github.com wrote:

Hi Bill,

to answer your questions: ad 1. These names look quite acceptable to me. While making the numerical size (64) part of the name is not something I would recommend in general, I think in this case, as we are talking of a very specific data type, it is acceptable. (My hesitation wrt such numerical components has to do with such ancient types as REAL*4 and the unfortunate "kind=4" practice ;)) ad 2. I would say that the two are meant for very different purposes. If they should be combined, why not do that explicitly via a conversion function? That way, should the need arise for a third type, say, bitset_128, you avoid the combinatorial explosion and you make it clearer that the data types are related but not merely different tastes of the same thing. If they should be combined in one operation, then that should be done explicitly. (You could also define a conversion from bitset_large to bitset_64, though that will necessarily truncate the bts beyond 64 or 63 ...)

Regards,

Arjen

Op do 3 sep. 2020 om 22:31 schreef William B. Clodius < notifications@github.com>:

I now have an implementation of bitsets that I will start testing. The main module file, stdlib_bitsets.f90, defines an abstract type, bitset_t, with two descendants, bitset_64 which can handle bitsets up to 64 bits, and bitset_large, which can handle bitsets up to huge(0_bits_kind) bits, where bits_kind is currently int32. The implementations of bitset_64 and bitset_large are in the submodule files, stdlib_bitset_64.f90 and stdlib_bitset_large.f90. at this point I have two questions for potential users:

1.

Are the names of the descendant types and their submodule files acceptable? 2.

There are a number of binary operations: and, and_not, eqv, or, and xor. Users I expect will normally want to use these with bitsets of the same size, but the current code allows the two arguments to differ in size effectively padding the smaller argument with zero bits where necessary. This results in significantly more complicated code for eqv, or, and xor, with a corresponding impact on the code robustness, runtime, and testing. Should I continue to allow arguments differing in size, or require the bitsets to be the same size and enforce it with if tests and error stops, or require the arguments to be the same size and simply leave the results undefined if the arguments differ in size?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/fortran-lang/stdlib/issues/221#issuecomment-686748448 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAN6YRZ4C5BMOF2UID7UMYTSD74KDANCNFSM4O7IG3CQ

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub < https://github.com/fortran-lang/stdlib/issues/221#issuecomment-688169151>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/APTQDOQ36Y6JDMZGVIKWFJLSESMNZANCNFSM4O7IG3CQ .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fortran-lang/stdlib/issues/221#issuecomment-688454340, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAN6YR45OKFL43NYTPMC6FDSEUL77ANCNFSM4O7IG3CQ .

wclodius2 commented 4 years ago

@arjenmarkus I have also never tried parameterized derived types. Support is more widespread than I expected, http://fortranwiki.org/fortran/show/Fortran+2003+status, but it has always sounded more quirky than I like.

wclodius2 commented 4 years ago

In thinking further a bitset type is about as close as you can come to a type suited for parameterized derived types, if there is one that is suited for PDTs. I am going to experiment a little with using PDTs for bitsets.

wclodius2 commented 4 years ago

After experimenting I have found a problem with gfortran 10.2's implementation of PDTs for which I cannot find a workaround. Also I had hoped to be able to declare bitsets with their bitesize initialized to zero and ifort's error messages seem to be indicate that that would be non-standard. So I have given up on using PDTs for bitsets.

ivan-pi commented 4 years ago

In thinking further a bitset type is about as close as you can come to a type suited for parameterized derived types, if there is one that is suited for PDTs. I am going to experiment a little with using PDTs for bitsets.

A PDT could bring bitsets very close to the C++ std:bitset. I've also done some testing, reaching the same conclusion - compiler support is not good enough.

With ifort 19.1 I am able to compile the following type hierarchy:

module stdlib_abstract_bitset

  implicit none
  private

  public :: bitset_t

  type, abstract :: bitset_t
  end type

end module

module stdlib_small_bitset

  use stdlib_abstract_bitset, only: bitset_t
  use iso_fortran_env, only: int32, int64
  implicit none

  integer, parameter :: block_kind = int32
  integer, parameter :: bits_per_block = storage_size(block_kind)

  type, extends(bitset_t) :: small_bitset(len)
    integer, len :: len = 0
    integer(block_kind) :: m_bits(ceiling(real(len)/real(bits_per_block)))
  end type

contains
...
end module

The part that started to bother me was in functions similar to the example from @arjenmarkus:

  function small_bitset_and(a,b) result(new)
    type(small_bitset(*)), intent(in) :: a
    type(small_bitset(a%len)), intent(in) :: b
    type(small_bitset(a%len)) :: new 
    integer :: i, nblocks

    nblocks = size(a%m_bits)
    do i = 1, nblocks
        new%m_bits(i) = iand(b%m_bits(i),a%m_bits(i))
    end do
  end function

Assuming I now bind this function to the operator .and. and use it as follows:

  type(small_bitset(64)) :: a, b, c
  type(small_bitset(32)) :: d

  c = a .and. b ! valid
  d = a .and. b ! expected it to be invalid, but compiler produces no error message

the value of d%len after the assignment becomes 64!

wclodius2 commented 4 years ago

I now have an implementation of bitset_t, bitset_64, and bitset_large that passes what I consider to be extensive testing. Before I consider doing a PR for it I would like to have opinions on the following questions:

Should I change bitset_t to bitset_type?
Should I add an _t or _type to bitset_64 or bitset_large?
Currently I assume without checking that for the binary ops and, and_not, or, xor, ==, /=, <, <=, >, and >= that the arguments have the same size. Should I enforce requiring the same size by adding a branch to error stop?

wclodius2 commented 4 years ago

@ivan-pi before I even consider using PDTs I need to be able to compile with both of the two most widely used compilers ifort and gfortran, and preferably versions of both compilers more than a couple of years old. Gfortran will not let me combine inheritance and PDTs, giving unrelated error messages. Ifort will accept combining inheritance and PDTs, but will not let me initialize at declaration the arrays holding the bits, i.e. will not accept code such as

module stdlib_abstract_bitset

  implicit none
  private

  public :: bitset_t

  type, abstract :: bitset_t
  end type

 end module

module stdlib_small_bitset

  use stdlib_abstract_bitset, only: bitset_t
  use iso_fortran_env, only: int32, int64
  implicit none

  integer, parameter :: block_kind = int32
  integer, parameter :: bits_per_block = storage_size(block_kind)

 type, extends(bitset_t) :: small_bitset(len)
    integer, len :: len = 0
    integer(block_kind) :: m_bits(ceiling(real(len)/real(bits_per_block)))=0 ! note the initialization
  end type

contains
...
end module

If I have to explicitly initialize the arrays I see no advantage to PDTs.