Open ZhaotangLuo opened 8 years ago
@Quinn126 please work with @ZhaotangLuo to make sure that his coding on this project is accurate and transparent and make sure you understand the story @ZhaotangLuo is telling
there should be a script pipelineing all the codes if there are multiple pieces @Quinn126 please document here the code associated with the project, ie location in dropbox and the graph for each slide should match exactly the title in the code
@ZhaotangLuo compute the fraction of reviews without bundle numbers and report results here you should also carefully document the comparison results in slides for both 700d and 750d
In the dataset reviews_750d_0205_withpage.txt (urap_programming\all_data\data_bazhuayu\accumulative_review\reviews_750d_0205_withpage.txt; I clean this original dataset by dropping duplicated rows and items that are NOT 750D),
542 out of 5868 (9.2%) reviews are without bundle indices.
All 542 reviews are from Taobao, out of which 449 are automatically-given reviews. The rest 93 reviews are from a seller (id = 36435403857) with no bundle indices available on its website.
@cxcxxin @suyanglu @ZhaotangLuo Where are the code that I'm supposed to document located?
Define rel_diff = abs_diff / sales_share, where sale_share is bundle-sales share in transaction record dataset or review dataset. Use total sales of each item to weight relative bundle-sales difference, i.e, replace abs_diff by rel_diff in the following formula.![image](https://cloud.githubusercontent.com/assets/17244944/14222607/528523e4-f827-11e5-957a-7ace79a2ec00.png)
Also do it for 700D