crypto-lake / lake-api

Python API for accessing Lake high frequency tick trades & order book data
https://crypto-lake.com/
Apache License 2.0
28 stars 3 forks source link

Error when downloading funding data - "cannot convert input with unit 's'" #11

Open ewald-florian opened 8 months ago

ewald-florian commented 8 months ago

Description

Error when trying do download PERP funding data as the API tries to convert the column "next_funding_time" to pd.datetime which fails since the data is not given in unix format.

Reproduce Error

table = "funding"
exchange = "BINANCE_FUTURES"
trading_pair = "BTC-USDT-PERP"

start_date = datetime(2023, 1, 1, 0, 0)
end_date = datetime(2023, 12, 31, 0, 0)

df = lakeapi.load_data( 
    table=table,
    start=start_date,
    end=end_date,
    symbols=[trading_pair],
    exchanges=[exchange],
    drop_partition_cols=True,
)

Error Message

cannot convert input with unit 's'

Cause of Trouble

lake-api/main.py line 216

if "next_funding_time" in df.columns:
        df["next_funding_time"] = pd.to_datetime(df["next_funding_time"], unit="s", cache=True)

Problem

The content of column "next_funding_time" is presumably not given in unix format but rather the absolute number of nano seconds until the next funding time so it is rather a time-difference than a time-stamp. I have not read the Binance API documentation, this is just the first explanation which came to my mind.

Potential Solution

Just leave "next_funding_time" in its plain format or optionally rename it to something like "ns_to_next_funding_time".

if "next_funding_time" in df.columns:
        df.rename(columns={"next_funding_time": "ns_to_next_funding_time"}, inplace=True)

Alternatively, "next_funding_time" could just be added to origin_time to get a timestamp column format. However, at least in the limited samples I have checked, next_funding_time does not really match with the specific time difference to the actual next funding data point anyways, so I don't think this would actually add useful information.

ewald-florian commented 8 months ago

I have now experimented a bit more with the data and figured out that "next_funding_time" is actually in unix and I can convert it afterwards using the original syntax: pd.to_datetime(df["next_funding_time"], unit="s", cache=True) It just breaks during the download process. Hence, I closed my pull request as this solved the problem for me but is obviously not a general adequate fix for the problem.

leftys commented 8 months ago

That's weird, I tried now with recent binance futures funding rates and they seem to work well including to_datetime conversion. Maybe some older data cause the conversion to break, I will investigate further.

leftys commented 8 months ago

It seems older binance futures data use funding rates in nanosecond format, so unit has to be set to 'ns'. Later I introduced this bug a few versions back by automatically converting the timestamp to pandas datetime.

I released fix in lakeapi 0.13.0, check it out!

ewald-florian commented 8 months ago

Thanks for fixing this so quickly! I just tested the exact same request with version 0.13.0 and it works without errors now.