Closed YeChen-IDM closed 1 year ago
some ideas:
import time
start_t = time.time()
sim['date'] = sim.apply(lambda row: first_ref_date + datetime.timedelta(days=int(row.simday)), axis=1)
end_t = time.time()
print(end_t-start_t)
start_t = time.time()
sim['date_2'] = [first_ref_date + datetime.timedelta(days=int(simday)) for simday in sim['simday']]
end_t = time.time()
print(end_t - start_t)
result:
26.090492486953735
0.9626128673553467
Another change I made is filtering out the main data frame by seed first then perform other operations which it's faster than doing the operations on the main data frame.
this is done with PR #53
at some point, we have a sim_duration_survey_sampling.csv file with size of ~4MB and the performance is acceptable in Python. Currently this site will only generate a patient_reports.csv file with size of 1.2GB and it's too slow to perform dataframe operation with this size of data.
We should look for a way to improve the performance of plotting result for this site.