USF-IMARS / imars-etl

:cloud: Tools for `extract` and `load` for IMaRS ETL (Extract, Transform, Load) operations
0 stars 0 forks source link

ingest PGC WV* images #35

Open 7yl4r opened 5 years ago

7yl4r commented 5 years ago

Lots of WV data in cozmel:/cozumel/imars_objects/ftp-ingest that needs to be loaded in using imars-etl.

PGC_Imagery_2019jan18/
├── imagery
│   ├── WV02_1030010004CE7200_M1BS_502581520080_01
│   │   ├── WV02_20100330170549_1030010004CE7200_10MAR30170549-M1BS-502581520080_01_P001-BROWSE.jpg
│   │   ├── WV02_20100330170549_1030010004CE7200_10MAR30170549-M1BS-502581520080_01_P001.ntf
│   │   ├── WV02_20100330170549_1030010004CE7200_10MAR30170549-M1BS-502581520080_01_P001.rename
│   │   ├── WV02_20100330170549_1030010004CE7200_10MAR30170549-M1BS-502581520080_01_P001.tar
│   │   ├── WV02_20100330170549_1030010004CE7200_10MAR30170549-M1BS-502581520080_01_P001.xml
│   │   ├── WV02_20100330170550_1030010004CE7200_10MAR30170550-M1BS-502581520080_01_P002-BROWSE.jpg
│   │   ├── WV02_20100330170550_1030010004CE7200_10MAR30170550-M1BS-502581520080_01_P002.ntf
│   │   ├── WV02_20100330170550_1030010004CE7200_10MAR30170550-M1BS-502581520080_01_P002.rename
│   │   ├── WV02_20100330170550_1030010004CE7200_10MAR30170550-M1BS-502581520080_01_P002.tar
│   │   ├── WV02_20100330170550_1030010004CE7200_10MAR30170550-M1BS-502581520080_01_P002.xml

Basically we just need to figure out the comand(s) and then run it(them).

Here's my quick first guess:

ls /srv/imars-objects/ftp-ingest/PGC_Imagery_2019jan18/imagery/*.ntf
    | xargs -n 1 imars-etl -v load \
    -p ?? \
    -l "WV0[2-3]_{junk}_M1BS_{junk2}.ntf" \
    --metadata_file_driver wv2_xml  \
    --metadata_file "{filepath}.xml" 
7yl4r commented 5 years ago

Folder format: WV0{n}_{id1}_{p_or_m}1BS_{id2}_01. I don't know what id1 or id2 are.

Files within a granule share a common base pattern: base_pattern = WV0{n}_%Y%m%d%H%M%S_{id1}_%y%b%d%H%M%S-{p_or_m}1BS-{id2}_01_P{pass_n}

So the files per granule are:

{base_pattern}-BROWSE.jpg
{base_pattern}.ntf
{base_pattern}.rename
{base_pattern}.tar
{base_pattern}.xml

Inside that .tar are a whole lot of files in a folder. The folder name pattern is the same as above. The files fall into two groups:

# matches last 1/2 of base_pattern above
pattern_1=%y%b%d%H%M%S-{p_or_m}1BS-{id2}_01_P{pass_n}
{pattern_1}-BROWSE.JPG
{pattern_1}.XML
{pattern_1}.ATT
{pattern_1}.EPH
{pattern_1}.GEO
{pattern_1}.IMD
{pattern_1}.RPB
{pattern_1}.TIL
{pattern_1}_README.TXT
{pattern_1}_PIXEL_SHAPE.dbf
{pattern_1}_PIXEL_SHAPE.prg
{pattern_1}_PIXEL_SHAPE.shp
{pattern_1}_PIXEL_SHAPE.shx

pattern_2={id_2}_01
{pattern_2}_LAYOUT.JPG
{pattern_2}_README.TXT
{pattern_2}_ORDER_SHAPE.dbf
{pattern_2}_ORDER_SHAPE.prj
{pattern_2}_ORDER_SHAPE.shp
{pattern_2}_ORDER_SHAPE.shx
{pattern_2}_PRODUCT_SHAPE.dbf
{pattern_2}_PRODUCT_SHAPE.prj
{pattern_2}_PRODUCT_SHAPE.shp
{pattern_2}_PRODUCT_SHAPE.shx
{pattern_2}_TILE_SHAPE.dbf
{pattern_2}_TILE_SHAPE.prj
{pattern_2}_TILE_SHAPE.shp
{pattern_2}_TILE_SHAPE.shx
7yl4r commented 5 years ago

After talking w/ @mjm8 , the plan is to ingest:

and ditch the *.rename and *.jpg files.

7yl4r commented 5 years ago

List of products we're ingesting:

x product.short_name product.id $glob
[ ] xml_wv2_p1bs 27 todo
[ ] xml_wv2_m1bs 14 todo
[ ] xml_wv3_p1bs ? todo
[ ] xml_wv2_m1bs ? todo
[ ] ntf_wv2_m1bs 11 todo
[ ] ntf_wv2_p1bs 24 *P1BS*.ntf
[ ] ntf_wv3_m1bs ? todo
[ ] ntf_wv3_p1bs ? todo
[ ] tar_wv2_p1bs ? todo
[ ] tar_wv2_m1bs ? todo
[ ] tar_wv3_p1bs ? todo
[ ] tar_wv3_m1bs ? todo

Load command:

find -type f -name $glob | 
    xargs -n 1 imars-etl load \
    -p $product_id \
    --metadata_file_driver wv2_xml  \
    --metadata_file "{directory}/{basename}.xml"