If you read an int column where some of the value use scientific notation, then the column is inteprited as a float column. If you give an integer type hint, then you get incorrect results (it ignores the exponent).
Using the following as test.csv:
id,name
204472098,foo
2.2E+11,bar
In [1]: import turicreate as tc
In [2]: tc.SFrame.read_csv('/tmp/test.csv')
Finished parsing file /tmp/test.csv
Parsing completed. Parsed 2 lines in 0.029181 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as
column_type_hints=[float,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file /tmp/test.csv
Parsing completed. Parsed 2 lines in 0.006446 secs.
Out[2]:
Columns:
id float
name str
Rows: 2
Data:
+----------------+------+
| id | name |
+----------------+------+
| 204472098.0 | foo |
| 220000000000.0 | bar |
+----------------+------+
[2 rows x 2 columns]
In [3]: tc.SFrame.read_csv('/tmp/test.csv', column_type_hints=[int,str])
Finished parsing file /tmp/test.csv
Parsing completed. Parsed 2 lines in 0.005768 secs.
Out[3]:
Columns:
id int
name str
Rows: 2
Data:
+-----------+------+
| id | name |
+-----------+------+
| 204472098 | foo |
| 2 | bar |
+-----------+------+
[2 rows x 2 columns]
If you read an int column where some of the value use scientific notation, then the column is inteprited as a float column. If you give an integer type hint, then you get incorrect results (it ignores the exponent).
Using the following as
test.csv
: