Open egs40 opened 3 months ago
It's very probable that the resources I had set up for this project have gotten too old. I have to get new footage myself in the next month or so of my property, so I'll take a look when I do at what seems to be the problem.
Thanks, I appreciate it.
Hi there,
I recently ran into the same issue with my DroneYard solution (https://github.com/AlexCarusoFan4/WinyamaDroneYard).
Looks like instances are launched, but never registered with the ECS cluster. I tried using the latest ECS optimised Amazon Linux AMI, and completely re-deploying my solution, but neither of these worked.
Today I refactored the solution to use the latest aws-cdk-lib for Batch, rather than relying on the alpha package, and am now having success with running imagery processing jobs again.
Would definitely recommend giving that a try - hopefully does the trick for you.
so im having this issue aswell , i had a running stack working good but since ive upgrade it - it stuck in runnable state and the cause is "failed to start Amazon Elastic Container Service IAM" in the ec2 instance which is generated after a spot request and all other parts of the flow works well , ive tried https://github.com/AlexCarusoFan4/WinyamaDroneYard and codm repo aswell and ended up the same path - something in aws batch config is set off for all of those right now
if someone finds a solution ill be glad to know - thansk alot.@
Hi there,
Have you tried using the non-spot configuration?
I would advise against using spot instances as although it's significantly cheaper, it can cause your processing jobs to be interrupted regardless of specifying a bidding price at the on demand rate.
In any case I'll be doing a run of some imagery next week and will see if I'm getting the same issue.
UPDATE: Just tried running a quick test. Confirming my deployed on-demand instance solution still runs OK and jobs don't get stuck.
Again would advise against spot instances as this specific ODM workload is not designed to be stateless.
@AlexCarusoFan4 first , thanks alot for the repsonse . what do you mean - ON_DEMEAND - to change bidding stratgey from SPOT to ON_DEMAND? Im actully getting the EC2 instacnes up and running but those instnaces fail to start the entry.sh script as userdata fails somehow so even if ill initiate ON_DEMAND instance that will happen - anyhow im trying to do so right now
No problem at all.
Yes that's correct, I would recommend changing the EC2 instance type in your config file to ON_DEMAND.
Do you have the exact error message you're getting in regards to the entry.sh file? If it's that the file can't be found - it's likely the line endings for your file in your locally cloned repository.
I initially had this issue when first working with the DroneYard stack. I work from a windows computer, and had to explicitly change the line endings for the .sh file for it to be readable once deployed to the Linux ODM container.
Thank you for your work on this project. I've encountered an issue while deploying the code following the instructions:
Problem:
The AWS Batch dashboard shows a valid and enabled compute environment Jobs enter a runnable state but don't progress Jobs remain in runnable status indefinitely The compute environment eventually becomes invalid
I received a notification stating that all EC2 instances in the Batch compute environment were scaled down due to a misconfiguration preventing them from joining the ECS Cluster. The notification suggests reviewing and updating/recreating the compute environment configuration, mentioning possible issues such as:
Any insights on what might be causing this issue would be appreciated.