Unable to obtain matrices from the formula using PySpark.
from csdid.att_gt import ATTgt
import pandas as pd, patsy
import pyspark.pandas as ps
from pyspark.sql import SparkSession
data = pd.read_csv("https://raw.githubusercontent.com/d2cml-ai/csdid/function-aggte/data/mpdta.csv")
psdata = ps.DataFrame(data)
patsy.dmatrices('lemp~1', data = psdata)
---------------------------------------------------------------------------
PandasNotImplementedError Traceback (most recent call last)
[<ipython-input-13-1b3e5ec1da9c>](https://localhost:8080/#) in <cell line: 2>()
1 import patsy
----> 2 patsy.dmatrices('lemp~1', data = psdata, return_type='matrix')
7 frames
[/usr/local/lib/python3.10/dist-packages/pyspark/pandas/missing/__init__.py](https://localhost:8080/#) in unsupported_function(*args, **kwargs)
21 def unsupported_function(class_name, method_name, deprecated=False, reason=""):
22 def unsupported_function(*args, **kwargs):
---> 23 raise PandasNotImplementedError(
24 class_name=class_name, method_name=method_name, reason=reason
25 )
PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.
Unable to obtain matrices from the formula using PySpark.