Cocoon-Data-Transformation / cocoon

MIT License
86 stars 9 forks source link

Cacoon work in fabric notebooks #10

Closed Palkers76 closed 1 month ago

Palkers76 commented 2 months ago

I'm testing cocoon on ms fabric notebooks

installed all fine. loaded csv for now as pandas df:

import pandas as pd
# Load data into pandas DataFrame from "/lakehouse/default/" + "Files/Sales target.csv"
Sales_target = pd.read_csv("/lakehouse/default/" + "Files/Sales target.csv")
Sales_target.head

initiated duckdb and cocoon, but table was not created

# @title 4. Load data to database (In mem DuckDB for demo) { vertical-output: true }
table_name = "Sales_target"
# For the demo case, we set up an in memory database
Sales_target.columns = [clean_column_name(col) for col in Sales_target.columns]
con = duckdb.connect(database=':memory:')
con.register(table_name, Sales_target)

query_widget, cocoon_workflow = create_cocoon_workflow(con)

image

ValueError Traceback (most recent call last) ~/cluster-env/trident_env/lib/python3.10/site-packages/ipywidgets/widgets/widget.py in ?(self, *args, **kwargs) 191 ip = get_ipython() 192 if ip is None: 193 self.log.warning("Exception in callback %s: %s", callback, e, exc_info=True) 194 else: --> 195 ip.showtraceback() 196 else: 197 value = local_value if local_value is not None else value 198 return value

~/cluster-env/trident_env/lib/python3.10/site-packages/cocoon_data/init.py in ?(b) 30042 if not database or not schema: 30043 output.value = "

Both database and schema must be specified.
" 30044 return 30045 else:

30046 success, result = create_schema_and_objects(con, database, schema) 30047 output.value = result 30048 if success: 30049 self.para["database"] = database

zachary62 commented 2 months ago

Very sorry! It's a new function for snowflake yesterday that breaks the codes. Could you upgrade cocoon to a new version:

! pip install cocoon_data==0.1.123
Palkers76 commented 2 months ago

it's not failing after upgrade but do not see any table in any schemas. what can be the reason ? image

image

zachary62 commented 2 months ago

Could you try:

con.execute(f"CREATE OR REPLACE TABLE {table_name} AS SELECT * FROM Sales_target")

con.register creates a view, that may be slow and expensive to query. So we only allow for tables.

Palkers76 commented 2 months ago

I have create it but

  1. still query widget showing no tables in all schemas, however I can run select * from TABLE from widget.

image

  1. and success , workflow now showing table

image

Palkers76 commented 2 months ago

however when I pressed explore. the query seems to be incorrect

image

and relax option after sometime returned error:

Automatically proceeding to the next node: Decide Unique Columns

~/cluster-env/trident_env/lib/python3.10/site-packages/cocoon_data/__init__.py in ?(self, node_name, document)
  11980 
  11981         elif len(children) == 1:
  11982             next_node = children[0]
  11983             print(f"Automatically proceeding to the next node: {next_node}")
> 11984             self.execute_node(next_node)
  11985 
  11986         else:
  11987             raise ValueError(f"Node {node_name} has multiple possible next nodes. 'next_node' must be specified in the document.")

~/cluster-env/trident_env/lib/python3.10/site-packages/cocoon_data/__init__.py in ?(self, node_name)
  11945             print(f"Node {node_name} already executed. Skipping.")
  11946             self.callback(node_name, document)
  11947             return
  11948 
> 11949         extract_output = node.extract(self.item)
  11950         run_output = node.run_and_retry(extract_output)
  11951 
  11952         node.postprocess(run_output, lambda document: self.callback(node_name, document), viewer=self.viewer, extract_output=extract_output)

~/cluster-env/trident_env/lib/python3.10/site-packages/cocoon_data/__init__.py in ?(self, item)
  17825     def extract(self, item):
  17826         clear_output(wait=True)
  17827 
  17828         table_pipeline = self.para["table_pipeline"]
> 17829         table_object = self.para["table_object"]
  17830         table_name = table_pipeline.__repr__(full=False)
  17831         with_context = table_pipeline.get_codes(mode="WITH")
  17832 

KeyError: 'table_object'
zachary62 commented 2 months ago

Got it! Yeah "Profile" has issues and I will fix it. Could you try Format first? It is similar to profile and more thorough. image

zachary62 commented 2 months ago

The profile issue shall be fixed in

! pip install cocoon_data==0.1.125

Note that, to support the latest version of openai, the way to specify api key is changed. See https://github.com/Cocoon-Data-Transformation/cocoon/issues/9#issuecomment-2263948573

Palkers76 commented 2 months ago

format I think works, however I wonder where dbt files were saved on fabric ;)

image

zachary62 commented 2 months ago

In the file directory of your notebook, you should see a folder called "dbt_directory" The results (sql + ymls) are under /dbt_directory/models/stage A quick python to test if files are there:

import os

directory = "./dbt_directory/models/stage/"

# Get the list of all files in the directory
files = os.listdir(directory)

# Print the list of files
print(f"Files in {directory}:")
for file in files:
    print(file)
Palkers76 commented 1 month ago

Solved