Auto analysis assigns each column/feature a data type (dtype in the parlance of NumPy and Pandas), e.g. categorical, numeric, real, integer, etc. This types must be automatically inferred from the dataset.
Questions to answer:
How pandas does this?
What does column-major mean for Trinket?
What types are we looking for?
How lightweight/heavyweight must this be?
Is there a certain density of data required to make a decision?
Do you have to go through the whole dataset to make a decision?
Can we use a sample approach to reading the data?
How do we detect if there is a header row or not?
Can we automatically detect delimiters and quote characters? (e.g. ; vs ,)
Auto analysis assigns each column/feature a data type (
dtype
in the parlance of NumPy and Pandas), e.g. categorical, numeric, real, integer, etc. This types must be automatically inferred from the dataset.Questions to answer:
Interesting stuff/libraries in: Data Type Recognition/Guessing of CSV data in python