JuliaData / DataTables.jl

(DEPRECATED) A rewrite of DataFrames.jl based on Nullable
Other
29 stars 11 forks source link

Throw error for type mismatch using join? #74

Closed sdmcallister closed 7 years ago

sdmcallister commented 7 years ago

Just wondering if it makes sense to include an error when two dt column types don't match?

Currently, join() will allow you to do the blend but will return all null values.

ararslan commented 7 years ago

Can you post an example?

sdmcallister commented 7 years ago

Here is a simple example:

dt1 = DataTable(a=[1,2,3,4],b=["1","2","3","4"])
dt2 = DataTable(b=[1,2,3,4],a=["1","2","3","4"])
join(dt1,dt2,kind = :left, on = :a)
nalimilan commented 7 years ago

We can't really do that, because in some cases elements will be considered equal even if types don't match:

dt1 = DataTable(a=[1,2,3,4],b=["1","2","3","4"])
dt2 = DataTable(a=[1.0,2.0,3.0,4.0],b=["a","b","c","d"])
join(dt1,dt2,kind = :left, on = :a)

Is this behavior a problem for you in practice?

sdmcallister commented 7 years ago

@nalimilan In my case, I was joining 5 CSVs with the final output having 42 variables. The behavior was unexpected, but I can see how the behavior can be useful in other scenarios.

nalimilan commented 7 years ago

And how/why did the variables on wanted to join on have different types?

It would still be possible to throw an error when joining on arrays mixing numbers and strings, though making an exception generally isn't great.