workgen stage of the cog conversion process was consuming a lot of NCI resources and slowing down the cog conversion process. This was limiting cog conversion process for max of 1 month of data. So, workgen logic needs to be re-written so that S3/ODC comparison can be a lot quicker.
Proposed solution
Update README.md for latest changes.
In converter/aws_products_config.yaml file, remove stacked_name_template element for all the products except fc products and add config details for landsat scenes.
Add --sat-row and --sat-path command option to converter/generate_work_list.sh shell script. And update generate-work-list command as per the latest changes.
Remove converter/mpi_cog_convert.sh file that is no longer used (we have merged the implementation within cog_convert app).
In converter/cogeo.py file update the following:
Remove references of netCDF. Since we want COG Conversion app to be applied to all the products including scenes, in the future.
Rename COGNetCDF class to COFConvert.
Rename netcdf_to_cog to generate_cog_files.
Elaborate variable names to remove any ambiguity.
Remove unused functions _make_s1_outprefix and _make_outprefix.
Update generate_cog_files to detect if any level 2 scenes are requested. If so, process all the tiff files associated with the metadata file.
Also, process the metadata doc for upload to S3.
Add a new function to convert tif files to coftiff.
In converter/cog_conv_app.py file update the following:
Reduce usage of global variables.
Support cog conversion of Landsat scenes.
Elaborate variable names to remove any ambiguity.
Rename netCDF_cog_worker function to cog_converter.
Rename any references to netCDF.
Remove unused function, yaml_files_for_product.
Add more documentation to get_param_names function.
Remove lengthy S3/ODC comparison within check_prefix_from_query_result. Instead use set operation to get the difference between S3 and ODC.
Return file.uri, s3 directory structure, and new s3 yaml file path from check_prefix_from_query_result function.
The return value shall be packaged and consumed within workgen and mpi-cog-convert applications respectively.
[x] Manually tested workgen for a time period of 3 years that took approximately 5-7 mins to generate a file list.
Request for this pull request
workgen
stage of thecog conversion
process was consuming a lot ofNCI
resources and slowing down thecog conversion
process. This was limitingcog conversion
process for max of 1 month of data. So,workgen
logic needs to be re-written so thatS3
/ODC
comparison can be a lot quicker.Proposed solution
README.md
for latest changes.converter/aws_products_config.yaml
file, removestacked_name_template
element for all the products exceptfc
products and add config details forlandsat
scenes.--sat-row
and--sat-path
command option toconverter/generate_work_list.sh
shell script. And updategenerate-work-list
command as per the latest changes.converter/mpi_cog_convert.sh
file that is no longer used (we have merged the implementation withincog_convert
app).converter/cogeo.py
file update the following:netCDF
. Since we wantCOG Conversion
app to be applied to all the products includingscenes
, in the future.COGNetCDF
class toCOFConvert
.netcdf_to_cog
togenerate_cog_files
._make_s1_outprefix
and_make_outprefix
.generate_cog_files
to detect if any level 2 scenes are requested. If so, process all the tiff files associated with the metadata file.converter/cog_conv_app.py
file update the following:cog conversion
ofLandsat
scenes.netCDF_cog_worker
function tocog_converter
.netCDF
.yaml_files_for_product
.get_param_names
function.S3/ODC
comparison withincheck_prefix_from_query_result
. Instead useset
operation to get the difference betweenS3 and ODC
.file.uri
,s3 directory structure
, andnew s3 yaml file path
fromcheck_prefix_from_query_result
function. The return value shall be packaged and consumed withinworkgen
andmpi-cog-convert
applications respectively.[x] Manually tested
workgen
for a time period of 3 years that took approximately5-7
mins to generate a file list.Fixes [#23]