NCAS-CMS / cfunits

A Python interface to UNIDATA’s UDUNITS-2 library with CF extensions:
http://ncas-cms.github.io/cfunits
MIT License
11 stars 8 forks source link

test suite fails #31

Closed daniel-mohr closed 3 years ago

daniel-mohr commented 3 years ago

run-test-suite-jobs (windows-2016) and run-test-suite-jobs (windows-2019) show an error:

ERROR: test_Units_conform (test_Units.UnitsTest)
Tests the `conform` class method on `Units`.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "D:\a\cfunits\cfunits\cfunits\test\test_Units.py", line 151, in test_Units_conform
    x = Units.conform([360], Units("second"), Units("minute"))
  File "d:\a\cfunits\cfunits\cfunits\units.py", line 2243, in conform
    y = x.view(dtype=float)
ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

In #29 this test was integrated and it was speculated that the reason is the new numpy version.

I believe now, that the use of x.dtype.char in the inplace conversion is platform dependent. The 21 different built-in types are the 21 fundamental data types of ctypes, whitch are also platform dependent. Array types and conversions between types does not show how many Bytes/Bits are used for a type. Further the following code leads to different results on windows (64-bit?) and linux (Ubuntu 18.04 (64-bit)):

On Windows:

>>> import numpy
>>> x = numpy.array([360], dtype=numpy.int32)
>>> x.dtype.char
'l'
>>> x = numpy.array([360], dtype=numpy.int64)
>>> x.dtype.char
'q'

On Linux:

>>> import numpy
>>> x = numpy.array([360], dtype=numpy.int32)
>>> x.dtype.char
'i'
>>> x = numpy.array([360], dtype=numpy.int64)
>>> x.dtype.char
'l'

Used character codes (x.dtype.char) can be found: integer types

daniel-mohr commented 3 years ago

Regarding the code It is only checked for signed integer. Are unsigned integers not possible?

daniel-mohr commented 3 years ago

If the input type is not specified, we could not expect a certain type.

daniel-mohr commented 3 years ago

Do we really want to return in any cases/platforms float32?

Changing this could break usage of cfunits.

daniel-mohr commented 3 years ago

32 solves this issue partly. For me the open items are:

sadielbartholomew commented 3 years ago

Thanks for raising this, @daniel-mohr.

I believe now, that the use of x.dtype.char in the inplace conversion is platform dependent ...

Great investigation work and thanks for providing the clear and concise evidence to back your conclusion in your comment. I am satisfied from your comments that you have found the underlying problem.

Regarding the code It is only checked for signed integer. Are unsigned integers not possible?

@davidhassell wrote most, if not all, of cfunits, and may be able to clarify further about this, but he is on leave until early next week, so I am afraid I don't know the answer until David can potentially elaborate.

I would guess that unsigned integers were, unintentionally, not considered as a possibility, and in that case it is good that you have spotted that prospect.

Do we really want to return in any cases/platforms float32? Changing this could break usage of cfunits.

Since the vast majority of our users (unless there has been a seismic shift we are not aware of) use a Linux or Mac OS, our main concern is that cfunits works for those platforms. So, whatever the solution, it is most important that it does not break cfunits for LInux or Mac. In fact, if there isn't any platform-agnostic solution, we would have to prioritise Linux and Mac even if it means we can't run cfunits on Windows, but I am confident there is at least one way to allow cfunits to work on all platforms given the small number of barriers you have unearthed.

I will start reviewing your associated PR shortly, but please note that is the main criteria that I will need to check against.

daniel-mohr commented 3 years ago

Regarding float32 I have to clarify myself: Independent of the bit range (32-bit or 64-bit or ...) the referenced code part runs in case x is an integer (x.dtype.char == "i"). The used bit range may depend on hardware platform/architecture (x86 or x86-64 or amd64 or ...) or on implementations (Python, Cpython, PyPy, MicroPython, ...). In general the bit range should not depend on the OS.

As an example: Someone puts a numpy.int64 number in and gets back a numpy.float32 number (no inplace). If he/she/it does the same but uses inplace=True the return type is numpy.float64. And if he/she/it only choose numpy.int as input type the number of input bits depend on the hardware platform/architecture/implementation (32-bit or 64-bit system).

I'm fine with float32 in all cases but I believe that this is not expected by all users. The other way round: If someone has an implementation already using cfunits, it may be expecting float32 and changing this could break this specific implementation. Therefore my hint 'could break usage'.

And yes of course focusing on Linux and/or Mac OS is perfect!

Let's wait for David.

daniel-mohr commented 3 years ago

Using the code

import numpy
print('[numpy.dtype](https://numpy.org/devdocs/reference/arrays.dtypes.html):')
print()
print('|            dtype             | dtype.char | dtype.kind | dtype.num |')
print('|------------------------------|------------|------------|-----------|')
for i in range(24):
    a = numpy.array([], dtype=numpy.sctypeDict[i])
    print('| {:<28} |     {:>2}     |    {:>2}      |    {:>2}     |'.format(
        str(numpy.sctypeDict[i]), a.dtype.char, a.dtype.kind, a.dtype.num))

I generated quickly a list, which I could not find somewhere. The list shows all dtypes numpy provides and list them together with the char, kind and num:

numpy.dtype:

dtype dtype.char dtype.kind dtype.num
<class 'numpy.bool_'> ? b 0
<class 'numpy.int8'> b i 1
<class 'numpy.uint8'> B u 2
<class 'numpy.int16'> h i 3
<class 'numpy.uint16'> H u 4
<class 'numpy.int32'> i i 5
<class 'numpy.uint32'> I u 6
<class 'numpy.int64'> l i 7
<class 'numpy.uint64'> L u 8
<class 'numpy.longlong'> q i 9
<class 'numpy.ulonglong'> Q u 10
<class 'numpy.float32'> f f 11
<class 'numpy.float64'> d f 12
<class 'numpy.float128'> g f 13
<class 'numpy.complex64'> F c 14
<class 'numpy.complex128'> D c 15
<class 'numpy.complex256'> G c 16
<class 'numpy.object_'> O O 17
<class 'numpy.bytes_'> S S 18
<class 'numpy.str_'> U U 19
<class 'numpy.void'> V V 20
<class 'numpy.datetime64'> M M 21
<class 'numpy.timedelta64'> m m 22
<class 'numpy.float16'> e f 23

This shows us, that if x.dtype.kind == "i": is fullfilled for:

dtype dtype.char dtype.kind dtype.num
<class 'numpy.int8'> b i 1
<class 'numpy.int16'> h i 3
<class 'numpy.int32'> i i 5
<class 'numpy.int64'> l i 7
<class 'numpy.longlong'> q i 9

Also the new code in PR #32 only handles the cases of numpy.int32, numpy.int64 and numpy.int16. Therefore numpy.int8 and numpy.longlong are still missing and should be mapped/converted to numpy.float32 and numpy.float128. Or is numpy.float64 sufficient for numpy.longlong?

daniel-mohr commented 3 years ago

And on a Windows machine using conda (similar to the runner here on github) we get a similar result with small differences (only list the differences:

dtype (linux) dtype (windows) dtype.char dtype.kind dtype.num
<class 'numpy.bool_'> ? b 0
<class 'numpy.int8'> b i 1
<class 'numpy.uint8'> B u 2
<class 'numpy.int16'> h i 3
<class 'numpy.uint16'> H u 4
<class 'numpy.int32'> <class 'numpy.intc'> i i 5
<class 'numpy.uint32'> <class 'numpy.uintc'> I u 6
<class 'numpy.int64'> <class 'numpy.int32'> l i 7
<class 'numpy.uint64'> <class 'numpy.uint32'> L u 8
<class 'numpy.longlong'> <class 'numpy.int64'> q i 9
<class 'numpy.ulonglong'> <class 'numpy.uint64'> Q u 10
<class 'numpy.float32'> f f 11
<class 'numpy.float64'> d f 12
<class 'numpy.float128'> <class 'numpy.longdouble'> g f 13
<class 'numpy.complex64'> F c 14
<class 'numpy.complex128'> D c 15
<class 'numpy.complex256'> <class 'numpy.clongdouble'> G c 16
<class 'numpy.object_'> O O 17
<class 'numpy.bytes_'> S S 18
<class 'numpy.str_'> U U 19
<class 'numpy.void'> V V 20
<class 'numpy.datetime64'> M M 21
<class 'numpy.timedelta64'> m m 22
<class 'numpy.float16'> e f 23
daniel-mohr commented 3 years ago

Sorry for so many information. I find them step by step. Looking at list_numpy_dtypes you can see more differences in the github actions. I will here only provide the relevant part for integers and floats:

dtype (ubuntu 18.04 + 20.04) dtype (Mac OS X 10.15.7) dtype (Windows Server 2016 + 2019) dtype.char dtype.kind dtype.num dtype.str
<class 'numpy.int8'> b i 1 i1
<class 'numpy.int16'> h i 3 <i2
<class 'numpy.int32'> <class 'numpy.int32'> <class 'numpy.intc'> i i 5 <i4
<class 'numpy.int64'> <class 'numpy.int64'> <class 'numpy.int32'> l i 7 <i8
<class 'numpy.int64'> <class 'numpy.longlong'> <class 'numpy.int64'> q i 9 <i8
<class 'numpy.float32'> f f 11 <f4
<class 'numpy.float64'> d f 12 <f8
<class 'numpy.float128'> <class 'numpy.float128'> <class 'numpy.longdouble'> g f 13 <f16
<class 'numpy.float16'> e f 23 <f2

Creating an numpy array with numpy.intc on such a Windows produces an numpy.int32. Therefore I do not really know, why there is a numpy.int32. Looking at the last column, I expect a similar result for Mac OS and numpy.longlong. But anyway it seems we could trust dtype.str.

I will try to provide a better and more general solution in PR #32: Just replacing 'i' in a dtype.str belonging to an integer by 'f' should be OK.

daniel-mohr commented 3 years ago

list_numpy_dtypes does now combine the results automatic.

The combination can be found at: https://daniel-mohr.github.io/list_numpy_dtypes/

daniel-mohr commented 3 years ago

I now integrated 32 bit platforms in list_numpy_dtypes . This shows/proofs now, that it is not only a problem with windows. Also on Ubuntu 18.04 i386 and Debian i386 the mismatch occurred.

daniel-mohr commented 3 years ago

The pull request #32 solves this issue partly. For me the open items are still:

sadielbartholomew commented 3 years ago

Hi @daniel-mohr, thanks for the excellent summary of open aspects of this issue. There's quite a lot of important decisions to make based on your comments, it seems.

Can I suggest, if you are happy to, in order to keep the discussions separate and to reflect that your PR #32 made the test suite pass in all cases, to move all items except the third to new, separate issues, then close this one? By that I mean to just lift and shift your comments above separately to the new issues, nothing that would take more than a minute.

It's helpful that you've outlined your proposed approach for each item - I'll arrange a chat with @davidhassell to briefly work out what we think should be done in each case and get back to you about it. If you did want to move to separate issues, we can comment separately on each.

As for the comment against the third bullet:

I do not think that combing this would give a benefit.

So, as I understand it, you don't think it would be worthwhile to check the test suite with Linux OS default packages and Windows against different Python versions? I would have thought it could be somewhat useful just to be rigorous in case of some strange facet of a specific Python version on a given OS that might cause a niche issue now and again. Also, the cfunits test suite is really quick relative to the Actions limits minutes we have to use (just on the free GitHub plan), so there is no real harm in running several more quick jobs, as it would require to combine the workflows so that we run both on the 2D matrix of both OS and Python version.

That said, overall if you think it is too much effort for too little benefit to combine them, that's fine and we can just leave them separate. It's no hugely important. (Alternatively we could perhaps, of the three, combine two and leave one separate?) I'll let you decide!

daniel-mohr commented 3 years ago

OK, I created 3 new issues and will think about combining things -- maybe I will provide a pull request or I will integrate it in a pull request regarding one of the new issues #34, #35 or #36.

daniel-mohr commented 3 years ago

@sadielbartholomew And thanks for all the nice words! ;-)