cxcxxin / urap_tech_sp16

issue tracking
1 stars 1 forks source link

filter_srp_by_reputation: srp_filter processing by ind_filter and reputation to calculate Accessory Price #40

Open JiazhenChen opened 8 years ago

JiazhenChen commented 8 years ago

Folder name: filter_srp_by_reputation

  1. merge two files 700D_with_rank, lst_700D according to item_id left merge lst_700D, 700D_with_rank. Keep lst_700D as base, add the "rank" from 700D_with_rank.

    Input: 1) input/lst_700D_Yi_Feb222016_giftindex_price_corrected_juexiao_0306 2) output/700D_with_rank.csv

    Output: name: matched_rep.csv columns: item_id, dict_index, rank(seller_reputation)

  2. for each row in the matched_rep.csv, based on "dict_index" find the file with same name in input/srp_0229_page5_imptonly directory. keep all lines with ratesum > rank, ind_filter = 1 with these lines, based on column "item_price", calculated price_min, price_max, price_q1, price_mean, price_q3 (according to position from small to large),

    Input: 1) input/matched_rep.csv 2) All the files in "input/srp_0229_page5_imptonly" Directory

    Output: columns: item_id, dict_index, rank, -> please rename this reputation_rank

@geyao0619 @JiazhenChen @suyanglu please see updates below the following acc street price measures: acc_price_min, acc_price_max, acc_price_avg acc_price_q1, acc_price_q3, acc_price_median, @JiazhenChen please also add in the mode for each output and @geyao0619 prob already informed you regarding computing the diff of the outputs based on diff classification of tmall store_ratings acc_price_mode acc_price_wt_qtypaid: sum(price * qty_paid) / sum(qty_paid) acc_price_wt_num_reviews: sum(price * review_cnt) / sum(review_cnt) acc_price_second_lowest: is the second lowest price acc_price_overall_second_lowest: is the lowest acc street price measures among all the acc street price measures (excluding acc_price_min and acc_price_q1) we name it second lowest b/c more likely than not acc_price_overall_second_lowest < acc_price_min

the following price dispersion measures: acc_price_disp_sd, acc_price_disp_variance, acc_price_disp_range = acc_price_max - acc_price_min, acc_price_gap_2lowest = acc_price_second_lowest - acc_price_min acc_price_gap_avg_min = acc_price_avg - acc_price_min

output file: filtered by reputation: compared_dict_filtered.csv haven't been filtered: compared_dict_unfiltered.csv

rank= # of reputation

JiazhenChen commented 8 years ago

@Quinn126

cxcxxin commented 8 years ago

@suyanglu lets discuss this today updates of instruction needed; match back to original search results order also get acc_price_rankbydefault_avg_1to5 etc.

suyanglu commented 8 years ago

Checked

cxcxxin commented 8 years ago

@suyanglu make sure @JiazhenChen @geyao0619 Generate a new output file dict_pricedisp_700d_w_rep_tmall_strictlybetterthantaobao.csv

where we treat each tmall seller as strictly better than any taobao seller, set rank as 15. (impute store_rating based on the taobao_tmall column as in fact only taobao store_rating is visible to consumers and the tmall ones are actually hidden on the webpage only in the html)

cxcxxin commented 8 years ago

@JiazhenChen @geyao0619 please also update the preconditions and other thngs as necessary after you included your part int he streamline thx!! \urap_programming\yi\gen_premium_700d_pilot_automated

cxcxxin commented 8 years ago

@suyanglu make sure @geyao0619 @JiazhenChen run on the latest input you got from Frank test directly in C:\Users\xin_chen\Dropbox\urap_programming\yi\gen_premium_700d_pilot_automated\