GeoscienceAustralia / dea-cogger

Convert NetCDFs to Cloud Optimized GeoTIFFs
15 stars 4 forks source link

Simplify workgen and add landsat scene processing code #25

Closed santoshamohan closed 5 years ago

santoshamohan commented 5 years ago

Request for this pull request

workgen stage of the cog conversion process was consuming a lot of NCI resources and slowing down the cog conversion process. This was limiting cog conversion process for max of 1 month of data. So, workgen logic needs to be re-written so that S3/ODC comparison can be a lot quicker.

Proposed solution

  1. Update README.md for latest changes.
  2. In converter/aws_products_config.yaml file, remove stacked_name_template element for all the products except fc products and add config details for landsat scenes.
  3. Add --sat-row and --sat-path command option to converter/generate_work_list.sh shell script. And update generate-work-list command as per the latest changes.
  4. Remove converter/mpi_cog_convert.sh file that is no longer used (we have merged the implementation within cog_convert app).
  5. In converter/cogeo.py file update the following:
    • Remove references of netCDF. Since we want COG Conversion app to be applied to all the products including scenes, in the future.
    • Rename COGNetCDF class to COFConvert.
    • Rename netcdf_to_cog to generate_cog_files.
    • Elaborate variable names to remove any ambiguity.
    • Remove unused functions _make_s1_outprefix and _make_outprefix.
    • Update generate_cog_files to detect if any level 2 scenes are requested. If so, process all the tiff files associated with the metadata file.
    • Also, process the metadata doc for upload to S3.
    • Add a new function to convert tif files to coftiff.
  6. In converter/cog_conv_app.py file update the following:
    • Reduce usage of global variables.
    • Support cog conversion of Landsat scenes.
    • Elaborate variable names to remove any ambiguity.
    • Rename netCDF_cog_worker function to cog_converter.
    • Rename any references to netCDF.
    • Remove unused function, yaml_files_for_product.
    • Add more documentation to get_param_names function.
    • Remove lengthy S3/ODC comparison within check_prefix_from_query_result. Instead use set operation to get the difference between S3 and ODC.
    • Return file.uri, s3 directory structure, and new s3 yaml file path from check_prefix_from_query_result function. The return value shall be packaged and consumed within workgen and mpi-cog-convert applications respectively.
omad commented 5 years ago

All changes have been rolled forward or replaced in #26