Q: Re Python Script widget - pandas dataframe to out_data

biolab / orange3

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

https://orangedatamining.com

Other

4.85k stars 1.01k forks source link

Q: Re Python Script widget - pandas dataframe to out_data #2932

Closed dsanalytics closed 6 years ago

dsanalytics commented 6 years ago

Is there a code example converting pandas dataframe to out_data? What I'm trying to accomplish is to get data from MySql using Python Script widget and then pass it to e.g. Data Sampler, Impute, etc.

Thank you.

kernc commented 6 years ago

With a recent version, this should mostly work:

from Orange.data.pandas_compat import table_from_frame

table = table_from_frame(df)

dsanalytics commented 6 years ago

@kernc Where does one set continuous/discrete and feature/meta/class? Also, what about index column that may be e.g. datetime or guid? How is that converted in out_data? If your code indeed suffices, would the last line be out_data = table?

kernc commented 6 years ago

You prepare the columns beforehand on the frame. pd.Categorical or string columns (df.Column.astype(str)) are interpreted as discrete. String columns that aren't interpreted as discretes (if force_nominal=False) are put into metas automatically. Datetime columns are interpreted as TimeVariable. Any index besides a simple range index is converted to a column.

dsanalytics commented 6 years ago

@kernc And that's exactly why I asked for a full working example from the community, instead of incomplete two-liners with comments like, you need to do X first, followed by Y, and perhaps Z.

P.S. Why don't you change your problematic avatar - imagine one browsing this post at work and a manager passing by. How's your avatar beneficial to Orange?

kernc commented 6 years ago

Beg your pardon? table = table_from_frame(df) is a full, working example. "It just works!" The function is used in one widget in Prototypes and in Timeseries. Unfortunately, no docs other than the docstring exist for the moment. Always welcome to contrib a more helpful example. :smile:

duohappy commented 6 years ago

@kernc , I can not find out "table_from_frame" function in help documention, https://docs.orange.biolab.si/3/data-mining-library/reference/data.html. In here, I get it.

ajdapretnar commented 6 years ago

It should be imported as: from Orange.data.pandas_compat import table_from_frame

ajdapretnar commented 6 years ago

For me, this works. @dsanalytics If you think something else should be added, please provide a detailed description.

pchristian4481 commented 4 years ago

table_from_frame does work, but the output when connected to a Data Table widget, does not have the column names? How to include these? I am using the in_data to create something different with different column names (feature names). How can I include these new feature names? Thanks for the help.

pmirla commented 4 years ago

Same question as pchristian4481. How do I retain column names? when I link this downstream? My code: colnames = [i.name for i in in_data.domain] df = Y_df.set_axis(colnames, axis=1, inplace=False) table = table_from_frame(Y_df) out_data = table

pmirla commented 4 years ago

This works. Might help others

import random from Orange.data import Domain, Table import numpy as np import pandas as pd from Orange.data.pandas_compat import table_from_frame

colnames = [i.name for i in in_data.domain] arr_in_data = np.array(in_data) col_to_search = "Date" col_index = colnames.index(col_to_search) df = pd.DataFrame(in_data.X) table = table_from_frame(df)

from Orange.data import Domain, Table domain = Domain([attr for attr in in_data.domain.attributes if attr.is_continuous or len(attr.values) <= 5], in_data.domain.class_vars)

out_data = Orange.data.Table(domain, df, in_data.Y)

out_data = Table(domain, out_data)