Closed daniel-mohr closed 3 years ago
Regarding the code It is only checked for signed integer. Are unsigned integers not possible?
If the input type is not specified, we could not expect a certain type.
Do we really want to return in any cases/platforms float32?
Changing this could break usage of cfunits.
Thanks for raising this, @daniel-mohr.
I believe now, that the use of x.dtype.char in the inplace conversion is platform dependent ...
Great investigation work and thanks for providing the clear and concise evidence to back your conclusion in your comment. I am satisfied from your comments that you have found the underlying problem.
Regarding the code It is only checked for signed integer. Are unsigned integers not possible?
@davidhassell wrote most, if not all, of cfunits, and may be able to clarify further about this, but he is on leave until early next week, so I am afraid I don't know the answer until David can potentially elaborate.
I would guess that unsigned integers were, unintentionally, not considered as a possibility, and in that case it is good that you have spotted that prospect.
Do we really want to return in any cases/platforms float32? Changing this could break usage of cfunits.
Since the vast majority of our users (unless there has been a seismic shift we are not aware of) use a Linux or Mac OS, our main concern is that cfunits works for those platforms. So, whatever the solution, it is most important that it does not break cfunits for LInux or Mac. In fact, if there isn't any platform-agnostic solution, we would have to prioritise Linux and Mac even if it means we can't run cfunits on Windows, but I am confident there is at least one way to allow cfunits to work on all platforms given the small number of barriers you have unearthed.
I will start reviewing your associated PR shortly, but please note that is the main criteria that I will need to check against.
Regarding float32 I have to clarify myself: Independent of the bit range (32-bit or 64-bit or ...) the referenced code part runs in case x
is an integer (x.dtype.char == "i"
). The used bit range may depend on hardware platform/architecture (x86 or x86-64 or amd64 or ...) or on implementations (Python, Cpython, PyPy, MicroPython, ...). In general the bit range should not depend on the OS.
As an example: Someone puts a numpy.int64
number in and gets back a numpy.float32
number (no inplace). If he/she/it does the same but uses inplace=True
the return type is numpy.float64
. And if he/she/it only choose numpy.int
as input type the number of input bits depend on the hardware platform/architecture/implementation (32-bit or 64-bit system).
I'm fine with float32 in all cases but I believe that this is not expected by all users. The other way round: If someone has an implementation already using cfunits, it may be expecting float32 and changing this could break this specific implementation. Therefore my hint 'could break usage'.
And yes of course focusing on Linux and/or Mac OS is perfect!
Let's wait for David.
Using the code
import numpy
print('[numpy.dtype](https://numpy.org/devdocs/reference/arrays.dtypes.html):')
print()
print('| dtype | dtype.char | dtype.kind | dtype.num |')
print('|------------------------------|------------|------------|-----------|')
for i in range(24):
a = numpy.array([], dtype=numpy.sctypeDict[i])
print('| {:<28} | {:>2} | {:>2} | {:>2} |'.format(
str(numpy.sctypeDict[i]), a.dtype.char, a.dtype.kind, a.dtype.num))
I generated quickly a list, which I could not find somewhere. The list shows all dtypes numpy provides and list them together with the char, kind and num:
dtype | dtype.char | dtype.kind | dtype.num |
---|---|---|---|
<class 'numpy.bool_'> | ? | b | 0 |
<class 'numpy.int8'> | b | i | 1 |
<class 'numpy.uint8'> | B | u | 2 |
<class 'numpy.int16'> | h | i | 3 |
<class 'numpy.uint16'> | H | u | 4 |
<class 'numpy.int32'> | i | i | 5 |
<class 'numpy.uint32'> | I | u | 6 |
<class 'numpy.int64'> | l | i | 7 |
<class 'numpy.uint64'> | L | u | 8 |
<class 'numpy.longlong'> | q | i | 9 |
<class 'numpy.ulonglong'> | Q | u | 10 |
<class 'numpy.float32'> | f | f | 11 |
<class 'numpy.float64'> | d | f | 12 |
<class 'numpy.float128'> | g | f | 13 |
<class 'numpy.complex64'> | F | c | 14 |
<class 'numpy.complex128'> | D | c | 15 |
<class 'numpy.complex256'> | G | c | 16 |
<class 'numpy.object_'> | O | O | 17 |
<class 'numpy.bytes_'> | S | S | 18 |
<class 'numpy.str_'> | U | U | 19 |
<class 'numpy.void'> | V | V | 20 |
<class 'numpy.datetime64'> | M | M | 21 |
<class 'numpy.timedelta64'> | m | m | 22 |
<class 'numpy.float16'> | e | f | 23 |
This shows us, that if x.dtype.kind == "i":
is fullfilled for:
dtype | dtype.char | dtype.kind | dtype.num |
---|---|---|---|
<class 'numpy.int8'> | b | i | 1 |
<class 'numpy.int16'> | h | i | 3 |
<class 'numpy.int32'> | i | i | 5 |
<class 'numpy.int64'> | l | i | 7 |
<class 'numpy.longlong'> | q | i | 9 |
Also the new code in PR #32 only handles the cases of numpy.int32
, numpy.int64
and numpy.int16
. Therefore numpy.int8
and numpy.longlong
are still missing and should be mapped/converted to numpy.float32
and numpy.float128
. Or is numpy.float64
sufficient for numpy.longlong
?
And on a Windows machine using conda (similar to the runner here on github) we get a similar result with small differences (only list the differences:
dtype (linux) | dtype (windows) | dtype.char | dtype.kind | dtype.num |
---|---|---|---|---|
<class 'numpy.bool_'> | ? | b | 0 | |
<class 'numpy.int8'> | b | i | 1 | |
<class 'numpy.uint8'> | B | u | 2 | |
<class 'numpy.int16'> | h | i | 3 | |
<class 'numpy.uint16'> | H | u | 4 | |
<class 'numpy.int32'> | <class 'numpy.intc'> | i | i | 5 |
<class 'numpy.uint32'> | <class 'numpy.uintc'> | I | u | 6 |
<class 'numpy.int64'> | <class 'numpy.int32'> | l | i | 7 |
<class 'numpy.uint64'> | <class 'numpy.uint32'> | L | u | 8 |
<class 'numpy.longlong'> | <class 'numpy.int64'> | q | i | 9 |
<class 'numpy.ulonglong'> | <class 'numpy.uint64'> | Q | u | 10 |
<class 'numpy.float32'> | f | f | 11 | |
<class 'numpy.float64'> | d | f | 12 | |
<class 'numpy.float128'> | <class 'numpy.longdouble'> | g | f | 13 |
<class 'numpy.complex64'> | F | c | 14 | |
<class 'numpy.complex128'> | D | c | 15 | |
<class 'numpy.complex256'> | <class 'numpy.clongdouble'> | G | c | 16 |
<class 'numpy.object_'> | O | O | 17 | |
<class 'numpy.bytes_'> | S | S | 18 | |
<class 'numpy.str_'> | U | U | 19 | |
<class 'numpy.void'> | V | V | 20 | |
<class 'numpy.datetime64'> | M | M | 21 | |
<class 'numpy.timedelta64'> | m | m | 22 | |
<class 'numpy.float16'> | e | f | 23 |
Sorry for so many information. I find them step by step. Looking at list_numpy_dtypes you can see more differences in the github actions. I will here only provide the relevant part for integers and floats:
dtype (ubuntu 18.04 + 20.04) | dtype (Mac OS X 10.15.7) | dtype (Windows Server 2016 + 2019) | dtype.char | dtype.kind | dtype.num | dtype.str | |
---|---|---|---|---|---|---|---|
<class 'numpy.int8'> | b | i | 1 | i1 | |||
<class 'numpy.int16'> | h | i | 3 | <i2 | |||
<class 'numpy.int32'> | <class 'numpy.int32'> | <class 'numpy.intc'> | i | i | 5 | <i4 | |
<class 'numpy.int64'> | <class 'numpy.int64'> | <class 'numpy.int32'> | l | i | 7 | <i8 | |
<class 'numpy.int64'> | <class 'numpy.longlong'> | <class 'numpy.int64'> | q | i | 9 | <i8 | |
<class 'numpy.float32'> | f | f | 11 | <f4 | |||
<class 'numpy.float64'> | d | f | 12 | <f8 | |||
<class 'numpy.float128'> | <class 'numpy.float128'> | <class 'numpy.longdouble'> | g | f | 13 | <f16 | |
<class 'numpy.float16'> | e | f | 23 | <f2 |
Creating an numpy array with numpy.intc
on such a Windows produces an numpy.int32
. Therefore I do not really know, why there is a numpy.int32
. Looking at the last column, I expect a similar result for Mac OS and numpy.longlong
. But anyway it seems we could trust dtype.str
.
I will try to provide a better and more general solution in PR #32: Just replacing 'i' in a dtype.str
belonging to an integer by 'f' should be OK.
list_numpy_dtypes does now combine the results automatic.
The combination can be found at: https://daniel-mohr.github.io/list_numpy_dtypes/
I now integrated 32 bit platforms in list_numpy_dtypes . This shows/proofs now, that it is not only a problem with windows. Also on Ubuntu 18.04 i386 and Debian i386 the mismatch occurred.
The pull request #32 solves this issue partly. For me the open items are still:
numpy.float64
in all cases. Further for an integer with more than 4 bytes a float32
is not sufficient. What do you think?Hi @daniel-mohr, thanks for the excellent summary of open aspects of this issue. There's quite a lot of important decisions to make based on your comments, it seems.
Can I suggest, if you are happy to, in order to keep the discussions separate and to reflect that your PR #32 made the test suite pass in all cases, to move all items except the third to new, separate issues, then close this one? By that I mean to just lift and shift your comments above separately to the new issues, nothing that would take more than a minute.
It's helpful that you've outlined your proposed approach for each item - I'll arrange a chat with @davidhassell to briefly work out what we think should be done in each case and get back to you about it. If you did want to move to separate issues, we can comment separately on each.
As for the comment against the third bullet:
I do not think that combing this would give a benefit.
So, as I understand it, you don't think it would be worthwhile to check the test suite with Linux OS default packages and Windows against different Python versions? I would have thought it could be somewhat useful just to be rigorous in case of some strange facet of a specific Python version on a given OS that might cause a niche issue now and again. Also, the cfunits test suite is really quick relative to the Actions limits minutes we have to use (just on the free GitHub plan), so there is no real harm in running several more quick jobs, as it would require to combine the workflows so that we run both on the 2D matrix of both OS and Python version.
That said, overall if you think it is too much effort for too little benefit to combine them, that's fine and we can just leave them separate. It's no hugely important. (Alternatively we could perhaps, of the three, combine two and leave one separate?) I'll let you decide!
OK, I created 3 new issues and will think about combining things -- maybe I will provide a pull request or I will integrate it in a pull request regarding one of the new issues #34, #35 or #36.
@sadielbartholomew And thanks for all the nice words! ;-)
run-test-suite-jobs (windows-2016) and run-test-suite-jobs (windows-2019) show an error:
In #29 this test was integrated and it was speculated that the reason is the new numpy version.
I believe now, that the use of
x.dtype.char
in the inplace conversion is platform dependent. The 21 different built-in types are the 21 fundamental data types of ctypes, whitch are also platform dependent. Array types and conversions between types does not show how many Bytes/Bits are used for a type. Further the following code leads to different results on windows (64-bit?) and linux (Ubuntu 18.04 (64-bit)):On Windows:
On Linux:
Used character codes (
x.dtype.char
) can be found: integer types