Closed dsanalytics closed 6 years ago
With a recent version, this should mostly work:
from Orange.data.pandas_compat import table_from_frame
table = table_from_frame(df)
@kernc Where does one set continuous/discrete and feature/meta/class? Also, what about index column that may be e.g. datetime or guid? How is that converted in out_data? If your code indeed suffices, would the last line be out_data = table?
You prepare the columns beforehand on the frame. pd.Categorical
or string columns (df.Column.astype(str)
) are interpreted as discrete. String columns that aren't interpreted as discretes (if force_nominal=False
) are put into metas automatically. Datetime columns are interpreted as TimeVariable. Any index besides a simple range index is converted to a column.
@kernc And that's exactly why I asked for a full working example from the community, instead of incomplete two-liners with comments like, you need to do X first, followed by Y, and perhaps Z.
P.S. Why don't you change your problematic avatar - imagine one browsing this post at work and a manager passing by. How's your avatar beneficial to Orange?
Beg your pardon? table = table_from_frame(df)
is a full, working example. "It just works!" The function is used in one widget in Prototypes and in Timeseries. Unfortunately, no docs other than the docstring exist for the moment. Always welcome to contrib a more helpful example. :smile:
@kernc , I can not find out "table_from_frame" function in help documention, https://docs.orange.biolab.si/3/data-mining-library/reference/data.html. In here, I get it.
It should be imported as:
from Orange.data.pandas_compat import table_from_frame
For me, this works. @dsanalytics If you think something else should be added, please provide a detailed description.
table_from_frame does work, but the output when connected to a Data Table widget, does not have the column names? How to include these? I am using the in_data to create something different with different column names (feature names). How can I include these new feature names? Thanks for the help.
Same question as pchristian4481. How do I retain column names? when I link this downstream? My code: colnames = [i.name for i in in_data.domain] df = Y_df.set_axis(colnames, axis=1, inplace=False) table = table_from_frame(Y_df) out_data = table
This works. Might help others
import random from Orange.data import Domain, Table import numpy as np import pandas as pd from Orange.data.pandas_compat import table_from_frame
colnames = [i.name for i in in_data.domain] arr_in_data = np.array(in_data) col_to_search = "Date" col_index = colnames.index(col_to_search) df = pd.DataFrame(in_data.X) table = table_from_frame(df)
from Orange.data import Domain, Table domain = Domain([attr for attr in in_data.domain.attributes if attr.is_continuous or len(attr.values) <= 5], in_data.domain.class_vars)
out_data = Orange.data.Table(domain, df, in_data.Y)
out_data = Table(domain, out_data)
Is there a code example converting pandas dataframe to out_data? What I'm trying to accomplish is to get data from MySql using Python Script widget and then pass it to e.g. Data Sampler, Impute, etc.
Thank you.