Closed francois-baptiste closed 1 year ago
I found several anti patterns where some imports are done in a loop, considerably affecting the upload time. https://github.com/CartoDB/raster-loader/blob/fc88b407299e008c5c80b8d9c4dbdb8351af6ca7/raster_loader/io.py#L95
These issues are fixed in the PR #72 in this commit https://github.com/CartoDB/raster-loader/pull/72/commits/11f05267b2cd72f32f2d629d4211ca24c7b83037
@francois-baptiste happy to help with this one. Feel free to assign me issues and I can help triage
Thank you @brendancol I found another anti pattern in the code. The dataframe to be uploaded to bigquery is built from a list of dict. That is much slower than building it from the tuple and column list like I did in the original script. Can you fix this one forking the quadbin branch ?
Thank you @brendancol I found another anti pattern in the code. The dataframe to be uploaded to bigquery is built from a list of dict. That is much slower than building it from the tuple and column list like I did in the original script. Can you fix this one forking the quadbin branch ?
The issue was not were I thought it was.
Creating a pyproj.transformer
for each loop was the issue.
Now we are on par with the preformance of my original script 🚀
Bug Description
Importing this file into BQ took 55 minutes using this code:
while the python script finishes in 4 mins
System information [Run
carto info
in a terminal and add the output here, overwriting the text below.]