CartoDB / raster-loader

https://raster-loader.readthedocs.io
Other
15 stars 4 forks source link

feat(raster-loader) Update raster-loader to generate new Raster and Metadata table format #116

Closed luipir closed 1 year ago

luipir commented 1 year ago

Story details: https://app.shortcut.com/cartoteam/story/358996

shortcut-integration[bot] commented 1 year ago

This pull request has been linked to Shortcut Story #358996: Update raster-loader to generate new Raster and Metadata table format.

luipir commented 1 year ago

Overall looks good. But there are some details to complete before merging:

Regarding the metadata: Why are width and height fields included

give the overall size of the original raster. To have this info, should be calculated with block and num of blocks. IMHO is exactly as center that can be derived from centroid of BBOX. Should I remove then width and height?

Jesus89 commented 1 year ago

This is not used actually but as num_blocks or num_pixels, we can keep it as informative data :+1:

luipir commented 1 year ago

I've found also an important bug:

  • The metadata is not incrementally updated on mosaic rasters. This must be done by reading the existing metadata and merging the old and new values.

There are some values missing in the metadata:

  • minresolution
  • maxresolution
  • nodata

We should keep wide_number_mode=>'round' in PARSE_JSON. This was added to solve a recent bug

@luipir @volaya

I didn't understand well @Jesus89 , could you point the mosaic raster you used, probably I missed a use case.

luipir commented 1 year ago
  • Fix progressbar count (using the latest job)

hmmm... not sure it is a reliable way to progress bar, because jobs are executed without clear execution order

luipir commented 1 year ago
  • Recompute num_blocks and num_pixels

could these values be different by that get from rasterio metadata? if a block fail to upload fail all upload so IMHO no need recompute these data. Other if there is a use case I do not considered.

Jesus89 commented 1 year ago

I didn't understand well @Jesus89 , could you point the mosaic raster you used, probably I missed a use case.

It's basically this code: https://github.com/CartoDB/raster-loader/pull/116/files#diff-35ccab082c7efa2b1a83c6fef7690c297582c4d96b4551043db8f491083886a2L1177. If we add two pieces of a raster mosaic, the first upload, the metadata will be associated to the first piece, but the second upload, the metadata will need to be updated to include both pieces

hmmm... not sure it is a reliable way to progress bar, because jobs are executed without clear execution order

The bug was that it always uses the num_records of the latest job, instead of the record of the specific job. For example, for 1510 blocks, it will push 500 + 500 + 500 + 10, but the previous implementation added 500 + 10 + 10 + 10. It does not matter when the jobs end, but they need to add to the progress bar the correct amount.

could these values be different by that get from rasterio metadata? if a block fail to upload fail all upload so IMHO no need recompute these data. Other if there is a use case I do not considered.

The num params are computed from the raster medatata width, height, block_width, block_height. It pushed the metadata for success upload, if the upload fails, it will have wrong metadata anyway

volaya commented 1 year ago

@Jesus89 I have fixed the issues that you mentioned. While adding a new test for the bounds re-calculation, I found another issue that I have fixed, related to metadata. It was not being written in some cases. Tests were passing because the fixtures were actually wrong.

Let me know what you think

luipir commented 1 year ago

LGTM btw Github do not allow me to approve a PR opened by me :)

Jesus89 commented 1 year ago

Let's rename "band_name" to "name" in metadata.bands @volaya. We will apply the changes in the API and AT @vdelacruzb