VizierDB / vizier-scala

The Vizier kernel-free notebook programming environment
Other
34 stars 11 forks source link

Use python type annotations to determine function schemas #219

Closed okennedy closed 1 year ago

okennedy commented 2 years ago

Vizier allows users to export python code for use outside of the current cell with the vizierdb.export_module method. A key feature of exported modules is that they're accessible in other modalities. For example, exported functions are available in SQL cells.

Unfortunately, for them to be useful with SQL cells, we need type information for the method's arguments and return value. Right now, we're following PySpark's lead in using PySpark type annotations (See the pyspark_types module). However, this is one of the major reasons that we have an dependency on pyspark (#220) that duplicates most of the imports of scala spark.

The goal of this project is to use python type annotations (e.g., PEP 484 instead of pyspark type annotations. e.g., as follows:

def foo(x: str) -> str: 
    ...
okennedy commented 1 year ago

It looks like we can't get rid of the pyspark dependency entirely :( However, we can still try to infer the corresponding spark type from the return type annotation on the python function.