aws-solutions-library-samples / aws-batch-arch-for-protein-folding

Apache License 2.0
73 stars 27 forks source link

Container overrides error in Jackhmmer batch job #18

Open chayduk-deloitte opened 1 year ago

chayduk-deloitte commented 1 year ago

I've been trying for a bit to get the Jackhmmer batch job to work, but I keep getting the following error from AWS Batch:

exec /bin/bash: exec format error

I've spent a bit of time debugging it, and it doesn't seem like it's an issue on the container end because a similar set of parameters for the run.sh script works on my local machine.

I tried editing jackhmmer_job.py to include bash run.sh at the beginning of the container override, but still received the same error. I also verified that there were no obvious issues with the bash script formatting that would be triggering a format error (eg. syntax errors, Windows-style encoding, etc)

EDIT: Here is the bash command that was sent to AWS Batch

["-i s3://batchproteinstack-batchfolds3bucket-vi5zsurpnh7m/7FCC/fastas/7FCC.fasta:d3da1a0fe53647c1977086023aa37721/fasta/7FCC.fasta","-o d3da1a0fe53647c1977086023aa37721/7FCC/msas/:s3://batchproteinstack-batchfolds3bucket-vi5zsurpnh7m/7FCC/msas/jackhmmer/","-o d3da1a0fe53647c1977086023aa37721/7FCC/features.pkl:s3://batchproteinstack-batchfolds3bucket-vi5zsurpnh7m/7FCC/msas/features/features.pkl","python3","/opt/msa/create_alignments.py","--fasta_paths d3da1a0fe53647c1977086023aa37721/fasta/7FCC.fasta","--output_dir d3da1a0fe53647c1977086023aa37721","--uniref90_database_path /database/uniref90/uniref90.fasta","--mgnify_database_path /database/mgnify/mgy_clusters_2022_05.fa","--template_mmcif_dir /database/pdb_mmcif/mmcif_files","--max_template_date 2023-08-01","--obsolete_pdbs_path /database/pdb_mmcif/obsolete.dat","--db_preset full_dbs","--model_preset monomer","--n_cpu 16","--bfd_database_path /database/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt","--uniref30_database_path /database/uniref30/UniRef30_2021_03","--pdb70_database_path /database/pdb70/pdb70"]

tennex-astone commented 9 months ago

@chayduk-deloitte Did you find a solution to that error? We're seeing the same behavior.

tennex-astone commented 9 months ago

Looks like this is related to the cpu architecture. The job gets past that error when run in the Graviton queue. We're verifying and will kick in a PR when we get it sorted.

tennex-astone commented 9 months ago

This seems to be fixed in the latest release.