crapher / medium

Apache License 2.0
69 stars 44 forks source link

Question about qqq_options_1m_dte_0_credit_spread.py #3

Closed huynhphuong10284 closed 6 months ago

huynhphuong10284 commented 6 months ago

In your code for "How I Achieved Almost 85% Win Rate Using 0DTE Options and the Credit Spread Strategy on QQQ", I see you're using data which having some columns such as expire_date, kind, strike, open_underlying,close_underlying. Could you please give me how to have these columns?

crapher commented 6 months ago

Hi! Usually, I get all that information from polygon.io when I download the data.

In this article, I detail all the steps to download the option data from polygon.io https://medium.com/@diegodegese/e20d46cbef3

In this article, I detail how to download the stock data from polygon.io https://medium.com/@diegodegese/fdab4bb6bfb1

When I download the options data, I download the expire_date, kind, and strike data as part of the dataset. When I download the stock data, I download the open and close prices for the underlying asset as part of the dataset.

After that, I merge both datasets by time to have that information merged. Here is the code I use to do it:

import warnings
warnings.filterwarnings("ignore")

import numpy as np
import pandas as pd

UNDERLYING_TICKER = 'QQQ'
STOCKS_FILE=f'/users/diego/finance/stocks/1t/{UNDERLYING_TICKER}.csv.gz'
OPTIONS_FILE=f'/users/diego/finance/options/1t/{UNDERLYING_TICKER}.csv.gz'

# Read Files
df_stocks = pd.read_csv(STOCKS_FILE, header=0)
df_options = pd.read_csv(OPTIONS_FILE, header=0)

# Prepare datasets
df_stocks = df_stocks[['date','open','close']]
df_stocks['date'] = pd.to_datetime(df_stocks['date'])
df_stocks = df_stocks.reset_index(drop=True)

df_options['date'] = pd.to_datetime(df_options['date'])
df_options = df_options[df_options['dte'] == 0]
df_options = df_options[['date','expire_date','kind','strike','dte','open','close']]

# Add stock information to options dataset
df_options = df_options.merge(df_stocks, how='left', left_on=['date'], right_on=['date'], suffixes=('','_underlying'))
df_options = df_options.dropna().reset_index(drop=True)

# Remove non needed fields
df_options = df_options.drop(['dte'], axis=1)

# Localize time to US Eastern Time
df_options['date'] = df_options['date'].dt.tz_localize('UTC').dt.tz_convert('US/Eastern').dt.strftime('%Y-%m-%d %H:%M:%S')
print(df_options)

# Save Filtered data
df_options.to_csv(f'/tmp/{UNDERLYING_TICKER.lower()}_dte_0.csv.gz', index=False)

The result of this script is the file I use in the article.