Closed luipir closed 1 year ago
This pull request has been linked to Shortcut Story #358996: Update raster-loader to generate new Raster and Metadata table format.
Overall looks good. But there are some details to complete before merging:
Regarding the metadata: Why are
width
andheight
fields included
give the overall size of the original raster. To have this info, should be calculated with block and num of blocks.
IMHO is exactly as center that can be derived from centroid of BBOX.
Should I remove then width
and height
?
This is not used actually but as num_blocks or num_pixels, we can keep it as informative data :+1:
I've found also an important bug:
- The metadata is not incrementally updated on mosaic rasters. This must be done by reading the existing metadata and merging the old and new values.
There are some values missing in the metadata:
- minresolution
- maxresolution
- nodata
We should keep
wide_number_mode=>'round'
in PARSE_JSON. This was added to solve a recent bug@luipir @volaya
I didn't understand well @Jesus89 , could you point the mosaic raster you used, probably I missed a use case.
- Fix progressbar count (using the latest job)
hmmm... not sure it is a reliable way to progress bar, because jobs are executed without clear execution order
- Recompute
num_blocks
andnum_pixels
could these values be different by that get from rasterio metadata? if a block fail to upload fail all upload so IMHO no need recompute these data. Other if there is a use case I do not considered.
I didn't understand well @Jesus89 , could you point the mosaic raster you used, probably I missed a use case.
It's basically this code: https://github.com/CartoDB/raster-loader/pull/116/files#diff-35ccab082c7efa2b1a83c6fef7690c297582c4d96b4551043db8f491083886a2L1177. If we add two pieces of a raster mosaic, the first upload, the metadata will be associated to the first piece, but the second upload, the metadata will need to be updated to include both pieces
hmmm... not sure it is a reliable way to progress bar, because jobs are executed without clear execution order
The bug was that it always uses the num_records of the latest job, instead of the record of the specific job. For example, for 1510 blocks, it will push 500 + 500 + 500 + 10, but the previous implementation added 500 + 10 + 10 + 10. It does not matter when the jobs end, but they need to add to the progress bar the correct amount.
could these values be different by that get from rasterio metadata? if a block fail to upload fail all upload so IMHO no need recompute these data. Other if there is a use case I do not considered.
The num params are computed from the raster medatata width, height, block_width, block_height. It pushed the metadata for success upload, if the upload fails, it will have wrong metadata anyway
@Jesus89 I have fixed the issues that you mentioned. While adding a new test for the bounds re-calculation, I found another issue that I have fixed, related to metadata. It was not being written in some cases. Tests were passing because the fixtures were actually wrong.
Let me know what you think
LGTM btw Github do not allow me to approve a PR opened by me :)
Let's rename "band_name" to "name" in metadata.bands @volaya. We will apply the changes in the API and AT @vdelacruzb
Story details: https://app.shortcut.com/cartoteam/story/358996