Duane321 / reinforcement_learning_for_rideshare_pricing

MIT License
3 stars 0 forks source link

Vector sim #3

Closed liux3372 closed 1 month ago

liux3372 commented 3 months ago

@Duane321 pls lmk for any question. pls also review my comments on #1

I'll try to get the matching code working efficiently and have a dump of simulation data.

Duane321 commented 3 months ago

Hey @liux3372 - good work here. Here are a few comments:

1) I'm hoping to see a dataframe representation of all the log data. I'm expecting something like 3 dataframes (D, S and matching) to represent 1 week. If I can see that, I can ask my own questions on my end. If you're already created this somewhere and I didn't see it, sorry in advance.

2) Next thing to think about is matching some empirical realities. We want a rider's accepted probability to be close to 50%. We also want a 1% increase in price to result in the rider's acceptance probability to go from 50% -> 49.5%. Since a 1% increase in price resulted in a 1% (relative) change in conversation, we say that's an elasticity of 1 over the short term. I'd like to be able to specific the elasticity as .5 or 2 if I'd like. Note: you should be able to determine how parameters in your model equate to elasticities.

3) Also, we'll want to think about longer term demand elasticity. A 10% increase in price (sustain over multiple rides) should result in 3% fewer rides-requested on the last day.

4) We'll need some supply elasticity stuff too.. but I don't know how to think about it yet.

5) Do you have any speed metrics somewhere? I'd like to get a sense of how many sim's we'll ultimately be able to run.

Thanks!

liux3372 commented 3 months ago

@Duane321

  1. No need to apologize as I haven’t make them as pandas data frames. Please see the dfs under data/10_weeks: demand_week_i.csv are the requests created at start of each day. supply_week_i.csv are the driver’s information at start & end of each day, as each driver’s location and idle time changes according to their trips. matched_trips_week_i.csv are simply the matched trips along with the acceptance probs for riders and drivers.

  2. Sure, I have adjusted the parameters to make the acceptance prob around 50%. Question is that demand elasticity has to be set holding other conditions the same, I mean if the trip_miles is very long, the acceptance probability won’t be decreased much just because the trip_miles increases. We may introduce supply shortage in the pricing function so that the price will be raised given the same trip_miles if there are more requests or less idle drivers. The elasticity will be based on the surcharge then. To measure the shortage is difficult as we have to record the previous requests and idle drivers for any given sub-block. Please lmk if you have any idea.

  3. For the long-term supply elasticity, the 10% increase in price means a 10% increase of avg. rider prices at the end of Dt over that of D(t-1), am I understanding it right?

  4. For the driver elasticity, I think putting on the short-term elasticity is relatively easy. Something like 1% increase in price will cause the driver’s acceptance probability to go from 50% -> 50.5%. Because driver’s elasticity may not depend on other conditions, the driver will simply take a longer rider if more money can be made(ignore the long-wait in an airport or empty ride to come back to the city etc.).

    The driver’s long-term elasticity is challenging though. It’s difficult because we don’t sample driver’s idle start time or idle duration by driver’s rejects as we did for riders.

    I can change each driver’s lambda for their exp var which are used to simulate idle duration if you like. At the moment, the mean idle duration is 480 mins for each driver(lambda=1/mean_idle_time). A 10% increase in price for each driver will increase the driver’s mean_idle_time to increase 10%, for example. But again, I have to create a vector with length of num_driver to keep track of each driver’s mean_idle_time.

  5. For the speed metrics, it takes 29 secs to run 100 weeks, and it takes 394 secs to run 1000 weeks. It could take around 12 hrs to do 100K weeks but I haven’t run it. There are definitely things we can do to make it faster, although it will take some time for me to do by trial and error.

Duane321 commented 3 months ago

To your points:

1) Got it. Thanks

2) Hm I see. I was thinking that if we have a function for price and a function for acceptance probability, then the elasticity should be closely related to some coefficient within acceptance_probability(price). It's linear, so it shouldn't depend on the levels of the other variables.

3) No. Elasticity is either demand or supply. It can only concern one side's sensitivity to price holding everything else constant. For supply, it should mean a 10% increase in the prices exposed to drivers (so this includes the rejected rides) means they idle less or come online less. We could say, arbitrarily, that a 10% price decrease means they will idle for 10% less time the next day.

4) Actually, let's forget long term driver elasticities. In practice, it's extremely hard to measure, so it's often ignored. So let's do 1) driver short term elasticity, 2) rider short term elasticities and 3) rider long term elasticity.

5) This is a good start!

liux3372 commented 3 months ago

@Duane321 forgot to mention, pls find a visualization of one week's data here: https://github.com/Duane321/reinforcement_learning_for_rideshare_pricing/blob/vector-sim/notebooks/log_visualization.ipynb

i'd like to clarify with you for the above items:

  1. Please note that the requests are created at the start of each day.

    The drivers dfs are at the end of each day.

    The drivers initial idle_time, idle_duration and idle locations are created at the start of each day. And for each matched trip, the driver who was matched and accepted the trip would have a updated idle start_timestamp as the request_timestamp plus ride_minutes; the driver’s idle_duration would be updated to the original idle_duration minus the minutes passed from the previous idle start_timestamp to the latest idle start_timestamp. (Details on line 315-318 on ridesharing_simulation.py)

  2. Currently, the price_of_ride is a linear function of ride_minutes and ride_miles; and acceptance probability is a sigmoid of a linear transformation of price_of_ride (Details on line 293-296 on ridesharing_simulation.py)

    I understand you want to adjust the coefficient on the linear transformation of price_of_ride when it is applied on the sigmoid. My concern is that it will naturally discourage riders to accept a longer ride.

    I think it should be a conditional elasticity which means the elasticity only matters for the same ride_minutes and ride_miles(or at least those in a reasonable quantile). Why will prices be different given the same ride_minutes and ride_miles? Because of a surcharge due to relatively shortage of supply(more requests or fewer drivers).

    Will take more time to think about it, but willing to hear your thoughts.

    1. So sorry I really mean the long-term demand elasticity(big typo!). “A 10% increase in price (sustain over multiple rides) should result in 3% fewer rides-requested on the next day.” Do you mean a 10% increase of avg. rider prices at the end of Dt over that of D(t-1)?
  3. Understood, for drivers short-term elasticity, I think we can simply adjust self.b_d in line 296 of ridesharing_simulation.py as their elasticity doesn’t have to be conditioned on anything(They will naturally want to do a longer ride and if they can’t make much money on this ride, they will more likely to reject the ride).

liux3372 commented 3 months ago

@Duane321 For the rider’s short-term elasticity:

I have tuned the self.a_r and self.b_r, pls lmk know which set of parameters make sense.

Prices are now normalized based on ride miles as discussed. Specifically on line 343, normalized_price = price_of_ride/ride_miles if ride_miles>1 else price_of_ride I pick ride_miles>1 because I find it often times if a ride is too short like 0.5, price_of_ride/ride_miles will be a ver large number greater than 10 which are outliers of normalized_price.

Set 1 Plot 1 self.a_r=0.2, self.b_r=-0.02 & Plot 2 self.a_r=0.2, self.b_r=-0.04 image

image

I run a linear regression of rider_acceptance_prob(i.e. y) over normalized_price(i.e. x), and the slope is -0.005 for the first plot and -0.01 for the second plot which means as one unit of normalized_price increases, on avg. rider_acceptance_prob will decrease by 0.5%.

This aligns what we have discussed the rider_elasticity_short_term=1 case. However, the range of the rider_acceptance_prob is too narrow(from 0.51 to 0.53) on the first plot and (from 0.47 to 0.51) on the second plot. Thus, I doubt the accepted trips will be much different even if we double the self.b_r here given that the two ranges are both narrow. And the scatter plot shows many horizontal lines clustering under each rider_acceptance_prob.

Set 2 Plot 1 self.a_r=1.5, self.b_r=-0.2 & Plot 2 self.a_r=2.5, self.b_r=-0.4 image

image

Same regression as above, this time the slopes are around -0.05 for the first plot and -0.1 for the second plot. This time the linear relationships are more clear on the scatter plot. And the range of rider_acceptance_prob is wider (from 0.45 to 0.7) on the first plot and (from 0.3 to 0.75) on the second plot. And the avg. R2 of 0.996 in Set 2 is much higher than the avg. R2 of 0.84 in Set 1. Pls lmk which set of params above you prefer.

Another way to do it is to solve b_r for every normalized_price(or at least normalized_price at different buckets), but it will certainly impact the runtime.

To find the derivative of P with respect to price, we start with the given function Now, applying the chain rule to find the derivative of P with respect to price

For the rider’s long-term elasticity:

If you look at line 69-78 and line 394-412. The lambdas of riders have already included the number of requests and rider accepts throughout the week. Specifically, each type of riders has a gamma distribution(alpha, beta). Every day we will sample the lambda based on gamma of each rider type before we sample the number of requests based on lambdas. The parameters alpha, beta are initialized with a fix vector for each type of rider, then alpha vector will increment based on the number of accepts for every ride of that type, beta vector will increment based on the number of requests for every ride of that type.

Please lmk if we still want to do lambda update based on rider_elasticity_long_term. Additionally, I think the gamma update has a similar effect on the idea we discussed last time to update lambda as an exponential weighted avg of accepted trips over a week.

For the driver’s short-term elasticity: Same idea for driver’s as driver_acceptance_prob uses a same formula as rider_acceptance_prob. They will have the same elasticity on normalized_price if they have a same an and b parameter in the sigmoid. I show an example of self.a_d=1.5, self.b_d=-0.2 image

Duane321 commented 2 months ago

That second set of plots where you have a larger range in conversation probabilities - I like that one better.

I don't think we've gotten the conversion calc write. You are computing d(P)/d(price). We're interested in (d(P)/P) / (d(price)/price) = d ln(P) / d ln(price). This ratio is short-term rider elasticity, where 1.0 is a typical value. So you're close. For FWIW, the slopes look about right in the second set of plots.

Driver side looks fine too.

For the longer term elasticities, I see update_gamma_distns might be doing what we want, but it's not called anywhere. Sounds like it should be called. It'd be nice to know how much a 10% change in average exposed prices to riders changes their ride requests the next day..

liux3372 commented 2 months ago

Thx a lot for your confirmation, DJ.

I'll use the parameters for the second set of plots and for the driver's elasticity. Sure, I'll call update_gamma_distns and see how a 10% change of avg. ride prices reflect the requests. Just wanted to make sure it is what we want before applying it.

And here is a updated formula for the elasticity formula, again it still requires a solver as it contains both b_r and P where P itself contains b_r as well. So, I think we should go for the heuristic parameter tuning I did above to save time for simulation.

Screenshot 2024-06-12 at 6 35 54 PM Screenshot 2024-06-12 at 6 36 43 PM
Duane321 commented 2 months ago

Yes, in those cases you go with what you know is typical. So conversation = .5 and price is equal to.. whatever is average price per mile. 5-6? Things don't need to be exact here. We just want the simulation to approximate something intuitive.

liux3372 commented 2 months ago

Hey @Duane321 , if you have time, pls take a look and let’s discuss how we move forward on Friday’s meeting.

For the rider’s short-term elasticity, it turns out the step-wise issue you mentioned is because of the rounding issue when I round every number of 2 decimal places. The rider_acceptance_prob actually smoother and the relationship with distance_normalized_price becomes more clear now. unknown

For rider’s long-term elasticity, after a few experiments, I decide not to use the gamma update to sample lambda because it will only increase based on daily accepted trips but won’t decrease by any rejects. Instead, I use the daily update on each rider’s lambda based on daily avg exposed prices to them. Pls see more details on daily_avg_exposed_price on ridesharing_simulation.py and it is called on every day’s simulation in simulation_this_week.update_lambda_longterm_elasticity(), 5th cell on
notebooks/log_visualization_100weeks_new_lambda_linear_change.ipynb

Each rider’s lambda will be a constant based on rider type, and every lambda will be updated based on self.a_lambda * (daily_avg_exposed_price - self.b_lambda) Where self.b_lambda is the historical daily avg exposed price based on 100 weeks, and self.a_lambda is to control the rider’s long-term elasticity. And I change x-axis to daily_avg_exposed_price which includes prices that are both accepted and rejected. As shown in the plot, the relation of exposed price and daily total requests is very clear and its p-value is significant on the linear regression. unknown

I think we can do multiple paired simulations by changing self.b_r = -0.2, self.b_d = -0.2, self.a_lambda = -0.05.

Duane321 commented 2 months ago

Great progress @liux3372 ! Here are my comments:

1) Short term elasticity looks good. Looks like 10%/15% which is a realistic elasticity. 2) Long term elasticity sounds properly defined, but the slope is too great. A 5% increase in exposed price reduces requests by almost 50%. It should be more like a 5% increase in exposed price reduces next day ride_requests counts by 1-3%. But this'll get fixed with some tuning. 3) Can I see short term elasticity on the driver side? I'm expecting the same short term curve, but sloping up. 4) Once we have all this, can you tell me how to access the data in a dataframe format? That way I can play around with things a bit.

liux3372 commented 2 months ago

@Duane321 I have done some parameters tuning: rider’s short-term elasticity, slope is -0.047, p-value is 0 image

driver’s short-term elasticity, slope is -0.073, p-value is 0 image

rider’s long-term elasticity, slope is -67.67, p-value is close 0 So for daily_avg_price with mean of 22.56 and daily_total_requests with mean of 867.99 over 100 weeks, a 1 unit increase of daily_avg_price(~4.4%) increase will cause on average an decrease of -67.67 for daily_total_requests(~7.8%) Interestingly, when I try to further shrink self.a_lambda(to a negative value with smaller abs value), the p-values increases to an insignificant value. So the linear relation between the two variables disappears. Not sure how to interpret that. image

If they all look fine, I'll provide those dataframes for you.

liux3372 commented 2 months ago

@Duane321, I have tuned the elasticity parameters and things should work out as we expected.

Dataframes of 100 weeks simulation are saved here: https://github.com/Duane321/reinforcement_learning_for_rideshare_pricing/tree/vector-sim/data/100_weeks_default_params_dataframes

The visualization and statistics are shown in this notebook: https://github.com/Duane321/reinforcement_learning_for_rideshare_pricing/blob/vector-sim/notebooks/data_generation_and_visualization.ipynb

If you look at the scatter plots in the notebook, from top to bottom we have elasticity for short-term rider, short-term driver and long-term rider, respectively.

Please lmk for any question.