Closed asram6 closed 7 years ago
The issue is, you can't represent a tree structure in a dataframe. Well you can, but how would you represent the members of a node? Either you have arrays inside of row elements, or you have redundant data as you have to put in a new row for each member of each node.
What do you want to achieve by putting it into a dataframe?
So I am trying to use Azure Data Lake Analytics to assign CHAID analysis jobs. This is done with a U-SQL script, and it seems like the only way to integrate Python in that script is to have a main function that takes in a dataframe and returns a dataframe. That's why I am trying to think of the best way to represent the tree in a dataframe, even though I realize it is not ideal.
Try:
pd.DataFrame(data=tree.tree_store)
When I do that it seems to give me an empty dataframe, even though there are nodes in the tree.
Actually, it seems like if I don't print the tree and try to print tree.tree_store, it is None. But if I print the tree first, it works. So, is the only way to work with tree_store to first print the tree?
You'll have to run
tree.build_tree()
As the tree is not built yet. Although, that's changing with the new release, and you'll be able to access the tree store without having to build it.
Closing due to inactivity.
Is there any way I can output the Tree as a pandas DataFrame? Just wondering if there is a function to do this, or if I will need to write my own code to do that.