apache / arrow

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
https://arrow.apache.org/
Apache License 2.0
14.57k stars 3.54k forks source link

[C++] unsupported cast from halffloat to utf8 #32802

Open asfimport opened 2 years ago

asfimport commented 2 years ago

Related to https://issues.apache.org/jira/browse/ARROW-17464 but for CSV.

When writing a table that contains halffloats to CSV:


import pyarrow as pa
import pyarrow.csv as csv
import numpy as np
t = pa.table(\{'a': [np.float16(1.0),np.float16(2.0)]}, schema=pa.schema([pa.field("a", pa.float16())]))
csv.write_csv(t, "out.csv")

Output:

pyarrow.lib.ArrowNotImplementedError: Unsupported cast from halffloat to utf8 using function cast_string

Reporter: Joost Hoozemans / @joosthooz Watchers: Rok Mihevc / @rok

Note: This issue was originally created as ARROW-17549. Please see the migration documentation for further details.

asfimport commented 2 years ago

Antoine Pitrou / @pitrou: Can you add "[C++]" to the beginning of the issue title to reflect the set component? See JIRA tips at https://arrow.apache.org/docs/developers/bug_reports.html#tips-for-using-jira

asfimport commented 2 years ago

Rok Mihevc / @rok: This is probably a partial duplicate of ARROW-3802. The main issue (if I understand correctly) is that we don't have the machinery to interpret float16s and we'd need to add it for this functionality (e.g. https://sourceforge.net/projects/half/), see ARROW-6436.

asfimport commented 2 years ago

Antoine Pitrou / @pitrou: @rok Exactly. Nothing should be particularly difficult here, it just needs someone motivated with enough time on their hands to steer it forward.

felipecrv commented 4 months ago

I have implemented this in a local branch. I will open a PR once (#43018) is merged. As that makes the generic tests pass without special-casing much.