digitalearthpacific / dep-mangroves

Mangrove monitoring for digital earth pacific
0 stars 0 forks source link

filter_by_log in print_tasks doesn't seem to be filtering #14

Open alexgleith opened 11 months ago

alexgleith commented 11 months ago

I ran 0.0.5 and then re-ran it, and all the tiles are added to the list of jobs to run, and each is then skipping as it's already completed.

I thought the print_tasks code would not even put the completed tasks on the queue.

jessjaco commented 11 months ago

I'm guessing you ran it with --datetime 2016/2022? If not, please tell me what params you used. I need to think about how to handle this case. For the coastlines, this would indicate there is a single 2016/2022 product (like a seven year mosaic). But for mangroves, we are iterating over years. So the logger is checking the 2016/2022 log and seeing there are no products there. (i.e. this path: https://deppcpublicstorage.blob.core.windows.net/output/dep_s2_mangroves/0-0-5/logs/dep_s2_mangroves_2016-2022_log.csv). But it should be checking individual years, which are complete (see https://deppcpublicstorage.blob.core.windows.net/output/dep_s2_mangroves/0-0-5/logs/dep_s2_mangroves_2016_log.csv). We can change the setup to accommodate this, but I also should think about your desire to migrate from the handy, convenient, compact and readable csv based logs to pinging the stac items directly.

alexgleith commented 11 months ago

Yep, that’s what we did. The config we used is in the 0-0-5 file here: https://github.com/digitalearthpacific/dep-kubernetes-apps/pull/7/files

It worked fine, and 1,100 tiles were completed (skipped) really quickly, so I don't think we really need to change it actually. It just didn't do what I expected!

I'm pretty enthusiastic about not having an external CSV as a state file, writing normal logs to stdout and using the STAC docs as a "jobs-already-done" flag! We can talk about this later, though.