MAAP-Project / maap-api-nasa

NASA Python implementation of the MAAP API specification
https://api.maap-project.org/api
Apache License 2.0
8 stars 3 forks source link

Copy output option completed jobs #103

Closed grallewellyn closed 7 months ago

grallewellyn commented 8 months ago

Added more checking and also need to replace triaged_job with triaged-jobs because we decided against renaming the folder from triaged-jobs and all failed jobs have triaged_job in their s3 link

Also extracted dps_output, triaged_job, and triaged-jobs as variables into api/settings.py

I also ran a quick notebook to test that in all the jobs submitted by all users in ops, there are no instances of a jobUrl being in triaged_job and not having the dataset/ prefix (AWS_TRIAGE_WORKSPACE_BUCKET_PATH in settings.py relies on this)

Also note that if the user names their algorithm just dps_output or triaged_job or triaged-jobs then their product path will be messed up. For example, if the user names their algorithm triaged-jobs and it succeeds, then their s3 link should look something like: s3://s3-us-west-2.amazonaws.com:80/maap-dit-workspace/grallewellyn/dps_output/triaged-jobs/main/2024/02/27/fa08a263-c58c-4b69-8b71-30efab48a913. Their product path would then see that triaged-jobs is present, and give them this product path: triaged_job/main/2024/02/27/fa08a263-c58c-4b69-8b71-30efab48a913 even though their job should be in the dps_output folder. This is based on the order of the jobs_output_folder_names array, but switching the order of the array would still give the same problems. We could rely on the location of the completed job to be after a certain number of /'s, but that is very strict and the location could potentially change later and cause a bug in this code.

We discussed this issue in the hackathon, and decided the best course of action was blacklist the algorithm names dps_output, triaged-jobs and triaged_job. Issue here: Algorithm name registration validation · Issue #934 · MAAP-Project/Community We also need to blacklist algorithm names that are the same name as an algorithm published by another user