Closed kryaksy closed 5 months ago
Your first schema isn't going to match your second schema, the number you picked exceeds int32
Schemas are the same:
const csv = `field\n2147483648`;
const tableFromCsv = await worker.table(csv);
console.log(await tableFromCsv.schema());
// {field: 'integer'}
As you see, even the value exceeds int32, schema response is integer.
You're in JS and int is int32. CSV is not JavaScript, so it can use e.g. int64.
I don't see any issue here, csv or arrow or python will allow for large integers whereas JS does not. So constructing something from a csv or arrow or python vs from JS will have minor differences as a result of the language and format differences.
Perspective currently only officially supports 32bit signed integer for "integer"
type columns, because JavaScript only had native support for this type (at least, when Perspective was originally published). If you want numbers larger than that, you'll need to use "float"
typed column (64bit float).
While I'd prefer this to be consistent behavior between platforms, and Perspective may soon support BigInt types as they have wide platform support today, it won't change the fact that "integer"
type will still be 32bit. Ergo in your case, the fix is not to accept this value in JavaScript, but rather to reject it in Python.
@texodus Thanks for the explanation. Based on the what you said, it would be correct to expect this field to be recognized as a 'float' when I create a table with a csv containing a 64-bit value, right?
// Csv containing 64-bit value
const csv = `field\n2147483648`;
const tableFromCsv = await worker.table(csv);
console.log(await tableFromCsv.schema());
// Expected {field: 'float'}
// Received {field: 'integer'}
This is the issue from my perspective. Am I missing something?
No, you're mixing CSV and javascript. If you're ingesting as csv, the internal type will be int64. But this doesn't exist in javascript, so when you query FROM JAVASCRIPT it gets truncated to int32 (both types represented in schema as "integer").
If you predefine your schema as float and then ingest as csv, then from javascript the stuff will come out as float.
Perspective's inference support is a best-guess meant for user convenience. We implemented it in a way that has minimal/negligible impact on performance, which would not be the case if we validated every cell (and in general it is not possible to always infer user intent for ambiguous columns).
I'm not against improving this support to handle more cases, but inference support is not meant to be complete. In this case (and others inevitably), you will need to provide a schema.
I clearly grasp the limitations and considerations you've detailed. Thank you both for the thorough explanation. As referenced in this discussion, we're developing a feature necessitating CSV validation by Perspective. Following that, I implemented your advice to use the table creation phase for validation. Yet, based on these discussions, it appears an additional validation step outside of Perspective might be required.
Bug Report
When updating a table created with a schema using a CSV file, an error is thrown, but the same error is not thrown when creating the table with the same CSV.
Steps to Reproduce:
Expected Result:
Consistent behavior between updating a schema-defined table and creating a table with the same CSV.
Actual Result:
While one of them succeeds, the other one throws error.
Environment:
perspective@2.7.2