Sorry, I there are two rows missing: https://github.com/datalev001/tm_lifetime/edit/main/code/repurchase_prophet_tm.py , see the red codes below:

Load the dataset with proper encoding

tran_df = pd.read_csv('online_retail_II.csv', encoding= "latin1")

This step filters out rows that contain missing or invalid values in the

key columns. c1 = (tran_df['Invoice'].isnull() == False) c2 = (tran_df['Quantity']>0) c3 = (tran_df['Customer ID'].isnull() == False) c4 = (tran_df['StockCode'].isnull() == False) c5 = (tran_df['Description'].isnull() == False) tran_df = tran_df[c1 & c2 & c3 & c4 & c5]

This step involves further cleaning and filtering

grp = ['Invoice', 'StockCode','Description', 'Quantity', 'InvoiceDate']

Duplicate transactions are removed

tran_df = tran_df.drop_duplicates(grp)

Converted into a standardized datetime format

tran_df['InvoiceDate'] = pd.to_datetime(tran_df['InvoiceDate']) tran_df['transaction_date'] = tran_df['InvoiceDate'].dt.date

choose products with higher transactions

cats_top = tran_df.Description.value_counts().reset_index() cats_top.columns = ['Description', 'count'] cats_top_df = cats_top[cats_top['count']>1000]

Filtered to keep only the high-frequency items:

pro_lst = list(set(cats_top_df['Description'])) tran_df_sel = tran_df[tran_df['Description'].isin(pro_lst)] tran_df_sel['trans_date'] = pd.to_datetime(tran_df_sel['transaction_date'], format = '%Y-%m-%d') cols = ['Customer ID', 'Description', 'trans_date', 'Quantity']

data to be used

tran_df_bs = tran_df_sel[cols]

On Mon, Sep 30, 2024 at 1:53 AM Trevor Miles @.***> wrote:

This is your code:

` Converted into a standardized datetime format tran_df['InvoiceDate'] = pd.to_datetime(tran_df['InvoiceDate']) tran_df['transaction_date'] = tran_df['InvoiceDate'].dt.date

cats_top = tran_df.Description.value_counts().reset_index() cats_top_df = cats_top[cats_top['count']>1000] Filtered to keep only the high-frequency items:

pro_lst = list(set(cats_top_df['Description'])) tran_df_sel = tran_df[tran_df['Description'].isin(pro_lst)] cols = ['Customer ID', 'Description', 'trans_date', 'Quantity'] data to be used

tran_df_bs = tran_df_sel[cols]`

I do not see any reference to trans_date until the last few rows. My code stops with an error at tran_df_bs = tran_df_sel[cols] because there is column called trans_date.

Is this an error?

— Reply to this email directly, view it on GitHub https://github.com/datalev001/tm_lifetime/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVGRSV36PTLGHV67OOCSGTDZZDRNZAVCNFSM6AAAAABPCQD5KGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TKNRTGAZTOOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

datalev001 / tm_lifetime

trans_date <> transaction_date #2

Filtered to keep only the high-frequency items:

data to be used

Load the dataset with proper encoding

This step filters out rows that contain missing or invalid values in the

This step involves further cleaning and filtering

Duplicate transactions are removed

Converted into a standardized datetime format

choose products with higher transactions

Filtered to keep only the high-frequency items:

data to be used