digitalearthpacific / dep-mangroves

Mangrove monitoring for digital earth pacific
0 stars 0 forks source link

filter_by_log in print_tasks doesn't seem to be filtering #14

Open alexgleith opened 1 year ago

alexgleith commented 1 year ago

I ran 0.0.5 and then re-ran it, and all the tiles are added to the list of jobs to run, and each is then skipping as it's already completed.

I thought the print_tasks code would not even put the completed tasks on the queue.

jessjaco commented 1 year ago

I'm guessing you ran it with --datetime 2016/2022? If not, please tell me what params you used. I need to think about how to handle this case. For the coastlines, this would indicate there is a single 2016/2022 product (like a seven year mosaic). But for mangroves, we are iterating over years. So the logger is checking the 2016/2022 log and seeing there are no products there. (i.e. this path: https://deppcpublicstorage.blob.core.windows.net/output/dep_s2_mangroves/0-0-5/logs/dep_s2_mangroves_2016-2022_log.csv). But it should be checking individual years, which are complete (see https://deppcpublicstorage.blob.core.windows.net/output/dep_s2_mangroves/0-0-5/logs/dep_s2_mangroves_2016_log.csv). We can change the setup to accommodate this, but I also should think about your desire to migrate from the handy, convenient, compact and readable csv based logs to pinging the stac items directly.

alexgleith commented 1 year ago

Yep, that’s what we did. The config we used is in the 0-0-5 file here: https://github.com/digitalearthpacific/dep-kubernetes-apps/pull/7/files

It worked fine, and 1,100 tiles were completed (skipped) really quickly, so I don't think we really need to change it actually. It just didn't do what I expected!

I'm pretty enthusiastic about not having an external CSV as a state file, writing normal logs to stdout and using the STAC docs as a "jobs-already-done" flag! We can talk about this later, though.