Added more checking and also need to replace triaged_job with triaged-jobs because we decided against renaming the folder from triaged-jobs and all failed jobs have triaged_job in their s3 link
Also extracted dps_output, triaged_job, and triaged-jobs as variables into api/settings.py
I also ran a quick notebook to test that in all the jobs submitted by all users in ops, there are no instances of a jobUrl being in triaged_job and not having the dataset/ prefix (AWS_TRIAGE_WORKSPACE_BUCKET_PATH in settings.py relies on this)
Also note that if the user names their algorithm just dps_output or triaged_job or triaged-jobs then their product path will be messed up. For example, if the user names their algorithm triaged-jobs and it succeeds, then their s3 link should look something like: s3://s3-us-west-2.amazonaws.com:80/maap-dit-workspace/grallewellyn/dps_output/triaged-jobs/main/2024/02/27/fa08a263-c58c-4b69-8b71-30efab48a913. Their product path would then see that triaged-jobs is present, and give them this product path: triaged_job/main/2024/02/27/fa08a263-c58c-4b69-8b71-30efab48a913 even though their job should be in the dps_output folder. This is based on the order of the jobs_output_folder_names array, but switching the order of the array would still give the same problems. We could rely on the location of the completed job to be after a certain number of /'s, but that is very strict and the location could potentially change later and cause a bug in this code.
We discussed this issue in the hackathon, and decided the best course of action was blacklist the algorithm names dps_output, triaged-jobs and triaged_job. Issue here: Algorithm name registration validation · Issue #934 · MAAP-Project/Community We also need to blacklist algorithm names that are the same name as an algorithm published by another user
Added more checking and also need to replace
triaged_job
withtriaged-jobs
because we decided against renaming the folder fromtriaged-jobs
and all failed jobs havetriaged_job
in their s3 linkAlso extracted
dps_output
,triaged_job
, andtriaged-jobs
as variables into api/settings.pyI also ran a quick notebook to test that in all the jobs submitted by all users in ops, there are no instances of a jobUrl being in triaged_job and not having the dataset/ prefix (AWS_TRIAGE_WORKSPACE_BUCKET_PATH in settings.py relies on this)
Also note that if the user names their algorithm just
dps_output
ortriaged_job
ortriaged-jobs
then their product path will be messed up. For example, if the user names their algorithmtriaged-jobs
and it succeeds, then their s3 link should look something like:s3://s3-us-west-2.amazonaws.com:80/maap-dit-workspace/grallewellyn/dps_output/triaged-jobs/main/2024/02/27/fa08a263-c58c-4b69-8b71-30efab48a913
. Their product path would then see thattriaged-jobs
is present, and give them this product path:triaged_job/main/2024/02/27/fa08a263-c58c-4b69-8b71-30efab48a913
even though their job should be in thedps_output
folder. This is based on the order of thejobs_output_folder_names
array, but switching the order of the array would still give the same problems. We could rely on the location of the completed job to be after a certain number of /'s, but that is very strict and the location could potentially change later and cause a bug in this code.We discussed this issue in the hackathon, and decided the best course of action was blacklist the algorithm names
dps_output
,triaged-jobs
andtriaged_job
. Issue here: Algorithm name registration validation · Issue #934 · MAAP-Project/Community We also need to blacklist algorithm names that are the same name as an algorithm published by another user