Open info-rchitect opened 6 years ago
I believe it is a bug, but can you please provide a self-containing example for easier reproduction?
@zverok sorry for the delay, will get to this today.
OK here we go:
[1] pry(#<Nemawashi::Analytics::Transforms::Update>)> @source.data
=> #<Daru::DataFrame(4x7)>
unit testA_ft1 testA_ws1 testB_ft1 testB_ws1 testC_ft1 testC_ws1
1 1 nil 0.0 nil 0.0 nil 1.0
2 2 0.0 nil 1.0 nil 0.0 nil
3 3 nil 0.0 nil 0.0 nil 1.0
4 4 0.0 nil 1.0 nil 0.0 nil
[2] pry(#<Nemawashi::Analytics::Transforms::Update>)> dataobj
=> #<Daru::DataFrame(4x4)>
unit testA_ws1 testD_ws1 testE_ws1
1 1 1.0 0.0 1.0
2 2 0.0 1.0 0.0
3 3 1.0 1.0 0.0
4 4 0.0 1.0 0.0
[3] pry(#<Nemawashi::Analytics::Transforms::Update>)> @source.data.index
=> #<Daru::MultiIndex(4x1)>
1
2
3
4
[4] pry(#<Nemawashi::Analytics::Transforms::Update>)> dataobj.index
=> #<Daru::MultiIndex(4x1)>
1
2
3
4
[5] pry(#<Nemawashi::Analytics::Transforms::Update>)> updated_dataframe = @source.data.join(dataobj, how: :left, on: match_columns)
=> #<Daru::DataFrame(4x10)>
testA_ft1 testA_ws1_ testB_ft1 testB_ws1 testC_ft1 testC_ws1 unit testA_ws1_ testD_ws1 testE_ws1
0 nil 0.0 nil 0.0 nil 1.0 1 1.0 0.0 1.0
1 0.0 nil 1.0 nil 0.0 nil 2 0.0 1.0 0.0
2 nil 0.0 nil 0.0 nil 1.0 3 1.0 1.0 0.0
3 0.0 nil 1.0 nil 0.0 nil 4 0.0 1.0 0.0
[6] pry(#<Nemawashi::Analytics::Transforms::Update>)> updated_dataframe.index
=> #<Daru::Index(4): {0, 1, 2, 3}>
In reality, it seems like any join between a multiindex DF and any type of indexed DF results in a Daru::Index DF. I believe this is related to the merge issue.
Please check if it works on current master
should be solved by #431
@zverok I am still seeing the join method turn 2 DFs with multiindex into a single index but my example perhaps is not the best due to the fact that it looks like daru is actually 'optimizing' away the multi-index.
[1] pry(#<Nemawashi::Analytics::Transforms::Update>)> @source.data.index.to_a
=> [[1], [2], [3], [4]]
[2] pry(#<Nemawashi::Analytics::Transforms::Update>)> dataobj.index.to_a
=> [[1], [2], [3], [4]]
[3] pry(#<Nemawashi::Analytics::Transforms::Update>)>
60: binding.pry
61: updated_dataframe = @source.data.join(dataobj, how: :left, on: match_columns)
=> 62: binding.pry
[2] pry(#<Nemawashi::Analytics::Transforms::Update>)> updated_dataframe.index.to_a
=> [0, 1, 2, 3]
Is this intended behavior? I will try to find a better example where the pre-join index cannot be flattened to be the same as the post-join index.
Well, in fact, currently:
I have a suspicion that the most reasonable behavior would be "make join columns an index", but not sure it should/would/can work (see "duplicate index discussion" here ;)).
Anyways, I am pretty sure your initial request (preserving index of both dataframes on join) can't be performed. Why do you believe otherwise?
@zverok if two dataframes have the same index shouldn't the index be preserved?
But why it should? Logically, "joining" is merging of different data sources: people (indexed by name) join departments (indexed by id), books (indexed by isbn) join authors (indexed by name)... What is the real-life use case for two dataframes that should be joined, but have the same index?
@zverok so let's say my multi-index is [:work_week, :factory] and i pull yield data from one database and I pull volume forecast data from another database that uses the same multi-index of [:work_week, :factory] and want them to be joined. why should I lose the matching multi-index on the subsequently created dataframe?
By which column you join them, then?
those 2 columns.
So, you join them on index?
Hi,
I am seeing a multi-index DF lose its multi-index (goes to 0,1,2,3...) when doing a left join with a DF without a multi-index. Is that the expected behavior?
thx