NorthArrowResearch / champ-workbench

http://workbench.northarrowresearch.com/
GNU General Public License v3.0
0 stars 0 forks source link

S3 Archive inside the SqlLite #16

Open MattReimer opened 7 years ago

MattReimer commented 7 years ago

It would be helpful to be able to query what's in the repo. As we discussed, having to constantly make spreadsheets for this stuff is tedious. Would be nice to have a DB we can query directly.

aws s3 ls --recursive s3://sfr-champdata > dump.txt

The output looks like this:

2017-05-01 13:14:31       7663 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/Control_Points.shp.xml
2017-05-01 13:14:31        188 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/Control_Points.shx
2017-05-01 13:14:31         90 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEM.tfw
2017-05-01 13:14:31     696221 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEM.tif
2017-05-01 13:14:31       1914 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEM.tif.aux.xml
2017-05-01 13:14:31     172573 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEM.tif.ovr
2017-05-01 13:14:31       6330 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEM.tif.xml
2017-05-01 13:14:31         90 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEMHillshade.tfw
2017-05-01 13:14:31     128074 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEMHillshade.tif
2017-05-01 13:14:31      25909 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEMHillshade.tif.aux.xml
2017-05-01 13:14:31      46840 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEMHillshade.tif.ovr
2017-05-01 13:14:31       7493 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEMHillshade.tif.vat.dbf
2017-05-01 13:14:31       7463 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/DEMHillshade.tif.xml
2017-05-01 13:14:31         90 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/Detrended.tfw
2017-05-01 13:14:31     664751 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/Detrended.tif
2017-05-01 13:14:31       1794 QA/2011/Asotin/ASW00001-NF-F4P1BR/VISIT_227/Topo/GISLayers/Detrended.tif.aux.xml

Features:

  1. Should be able to read the list and extract the following fields as fields in the sqllite DB
    • Size
    • Date
    • Visit ID
    • Filename
    • File extension
  2. Should be able to quickly and completely purge all previous records before reading the list.
  3. Link visitID to the rest of the DB

We can look at making the actual S3 calls in .NET but for now that's even fancier than we need. There's nothing wrong with aws s3 ls on the console for now.

philipbaileynar commented 7 years ago

@MattReimer is this still needed? I think this is out of date now that we are close to having Maude and direct retrieval of engine inputs from API. We are moving pretty fast to a system where we don't store or use champ automation data on S3.

MattReimer commented 7 years ago

@philipbaileynar This was only ever supposed to be a temporary measure. I think the problem is that with API and cloudwatch logs you still don't get 100% visibility on what we "have".

So, either we move the discussion of meta metric schemas forward quickly or we do this as a stop-gap measure until we have that conversation.

Just my $0.02CAD