dvgodoy / handyspark

HandySpark - bringing pandas-like capabilities to Spark dataframes
MIT License
188 stars 24 forks source link

tukey outliers fails #27

Open DrYSG opened 3 years ago

DrYSG commented 3 years ago

I tried:

hdf.outliers(method='tukey', k=3.)

and I got this error with pySpark 3.01

HANDY EXCEPTION SUMMARY

Location: "<string>"
Line    : 3
Function: raise_from
Error   :    +- Relation[dayOfWeek#73,AIRLINE#74,FLIGHT_NUMBER#75,ORIGIN_AIRPORT#76,DESTINATION_AIRPORT#77,DISTANCE#78,SCHEDULED_TIME#79,plannedDepartTime#80,label#81] parquet
---------------------------------------------------------------------------
HandyException: cannot resolve 'approx_percentile(`FLIGHT_NUMBER`, CAST(0.25BD AS DOUBLE), 100.0BD)' due to data type mismatch: argument 3 requires integral type, however, '100.0BD' is of decimal(4,1) type.; line 1 pos 0;

might be related to #26