Open MilesAheadToo opened 1 month ago
Sorry, I there are two rows missing: https://github.com/datalev001/tm_lifetime/edit/main/code/repurchase_prophet_tm.py , see the red codes below:
tran_df = pd.read_csv('online_retail_II.csv', encoding= "latin1")
key columns. c1 = (tran_df['Invoice'].isnull() == False) c2 = (tran_df['Quantity']>0) c3 = (tran_df['Customer ID'].isnull() == False) c4 = (tran_df['StockCode'].isnull() == False) c5 = (tran_df['Description'].isnull() == False) tran_df = tran_df[c1 & c2 & c3 & c4 & c5]
grp = ['Invoice', 'StockCode','Description', 'Quantity', 'InvoiceDate']
tran_df = tran_df.drop_duplicates(grp)
tran_df['InvoiceDate'] = pd.to_datetime(tran_df['InvoiceDate']) tran_df['transaction_date'] = tran_df['InvoiceDate'].dt.date
cats_top = tran_df.Description.value_counts().reset_index() cats_top.columns = ['Description', 'count'] cats_top_df = cats_top[cats_top['count']>1000]
pro_lst = list(set(cats_top_df['Description'])) tran_df_sel = tran_df[tran_df['Description'].isin(pro_lst)] tran_df_sel['trans_date'] = pd.to_datetime(tran_df_sel['transaction_date'], format = '%Y-%m-%d') cols = ['Customer ID', 'Description', 'trans_date', 'Quantity']
tran_df_bs = tran_df_sel[cols]
On Mon, Sep 30, 2024 at 1:53 AM Trevor Miles @.***> wrote:
This is your code:
` Converted into a standardized datetime format tran_df['InvoiceDate'] = pd.to_datetime(tran_df['InvoiceDate']) tran_df['transaction_date'] = tran_df['InvoiceDate'].dt.date
cats_top = tran_df.Description.value_counts().reset_index() cats_top_df = cats_top[cats_top['count']>1000] Filtered to keep only the high-frequency items:
pro_lst = list(set(cats_top_df['Description'])) tran_df_sel = tran_df[tran_df['Description'].isin(pro_lst)] cols = ['Customer ID', 'Description', 'trans_date', 'Quantity'] data to be used
tran_df_bs = tran_df_sel[cols]`
I do not see any reference to trans_date until the last few rows. My code stops with an error at tran_df_bs = tran_df_sel[cols] because there is column called trans_date.
Is this an error?
— Reply to this email directly, view it on GitHub https://github.com/datalev001/tm_lifetime/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVGRSV36PTLGHV67OOCSGTDZZDRNZAVCNFSM6AAAAABPCQD5KGVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU2TKNRTGAZTOOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
This is your code:
` Converted into a standardized datetime format tran_df['InvoiceDate'] = pd.to_datetime(tran_df['InvoiceDate']) tran_df['transaction_date'] = tran_df['InvoiceDate'].dt.date
cats_top = tran_df.Description.value_counts().reset_index() cats_top_df = cats_top[cats_top['count']>1000]
Filtered to keep only the high-frequency items:
pro_lst = list(set(cats_top_df['Description'])) tran_df_sel = tran_df[tran_df['Description'].isin(pro_lst)] cols = ['Customer ID', 'Description', 'trans_date', 'Quantity']
data to be used
tran_df_bs = tran_df_sel[cols]`
I do not see any reference to
trans_date
until the last few rows. My code stops with an error attran_df_bs = tran_df_sel[cols]
because there is column called trans_date.Is this an error?