Applied-GeoSolutions / gips

Geospatial Image Processing System
GNU General Public License v3.0
17 stars 5 forks source link

ProjectInventory question #524

Open bhbraswell opened 4 years ago

bhbraswell commented 4 years ago

If I have a gips_export output that looks like this

% tree /data/modis_test 
modis_test
├── 0
│   ├── 2019100_MCD_ndvi.tif
│   └── 2019101_MCD_ndvi.tif
└── 1
    ├── 2019100_MCD_ndvi.tif
    └── 2019101_MCD_ndvi.tif

and if I say inv = ProjectInventory('/data/modis_test', None) then inv will only have the inventory of the second feature in the export.

(Pdb) inv[inv.dates[0]].filenames
{('MCD', 'ndvi'): '/data/modis_test/1/2019100_MCD_ndvi.tif'}

But I'm wondering if it should raise an error instead.

I'm asking this because I'm trying to find a simple way to extend gips_stats to create a single summary file for all the features in the export. Catching an error and looping over subdirectories would make that easier. Usual disclaimer: unless I'm missing something, which I often do.

ircwaves commented 4 years ago

There's definitely a bug here. As it is, the output of ProjectInventory('/data/modis_test', None) is non-deterministic, and I don't believe it was intended to be constructed as such.

The gips.data.core.Data.discover code does use os.walk to find files, but there is no code that does anything with the sub-directories. So, I would vote for the behaviour to be using os.listdir and returning an empty list for /data/modis_test. This isn't as good as getting an error message, but I think it is as good as we can do because it could just be that /outdir/feat1 is an empty export, which we would want to handle gracefully.

I've thought before that there should be a ProjectTree class which handles the iteration over inventory directory trees, and can handle app; or (2) lication & aggregation of an algorithm (i.e. gips_stats). In the end, I've always resorted to using Pool or GNU parallel to apply-and-then-aggregate.