clearlinux / distribution

Placeholder repository to allow filing of general bugs/issues/etc against the Clear Linux OS for Intel Architecture linux distribution
522 stars 29 forks source link

Intel Xeon P-series support in **AWS** #1307

Open GrabbenD opened 5 years ago

GrabbenD commented 5 years ago

We are looking to use Clear Linux with a Intel Xeon P-8259L but the P-series aren't supported as of yet according to Clear Linux system requirements.

Thanks for the fantastic work.

thiagomacieira commented 5 years ago

I can't find 8259L in https://ark.intel.com, but there's an 8260L https://ark.intel.com/content/www/us/en/ark/products/192476/intel-xeon-platinum-8260l-processor-35-75m-cache-2-40-ghz.html. Anyway, that's just a doc omission we have most certainly tested and verified Clear works on Cascadelakes.

GrabbenD commented 5 years ago

@thiagomacieira Thanks for clarifying this and I believe you need to update the official Clear Linux image in AWS Marketplace since it's not possible to install it on the instance types below in eu-west-1 / Ireland/Dublin DC (not sure if other DCs are affected since hardware for each instance type may vary between regions in AWS), we initially discovered this issue with:

(Intel Xeon P-8259L with NVIDIA T4 GPUs)
g4dn.xlarge
g4dn.2xlarge
g4dn.4xlarge
g4dn.8xlarge
g4dn.12xlarge
g4dn.16xlarge

We found these instances to be affected as well:

m5d.8xlarge (Intel Xeon Platinum 8175)
m5d.16xlarge (Intel Xeon Platinum 8175)
m5d.metal (Intel Xeon Platinum 8175)
m5.8xlarge (Intel Xeon Platinum 8175)
m5.16xlarge (Intel Xeon Platinum 8175)
m5.metal (Intel Xeon Platinum 8175)
m1.small (Intel Xeon Family)
m1.medium (Intel Xeon Family)
m1.large (Intel Xeon Family)
m1.xlarge (Intel Xeon Family)
c5n.large (Intel Xeon Platinum 8124M)
c5n.xlarge (Intel Xeon Platinum 8124M)
c5n.2xlarge (Intel Xeon Platinum 8124M)
c5n.4xlarge (Intel Xeon Platinum 8124M)
c5n.9xlarge (Intel Xeon Platinum 8124M)
c5n.18xlarge (Intel Xeon Platinum 8124M)
c5n.metal (Intel Xeon Platinum 8124M)
c5.12xlarge (2nd Gen Intel Xeon Platinum 8175CL)
c5.24xlarge (2nd Gen Intel Xeon Platinum 8175CL)
c5.metal (2nd Gen Intel Xeon Platinum 8175CL)
c1.medium (Intel Xeon Family)
c1.xlarge (Intel Xeon Family)
cc2.8xlarge (Intel Xeon E5-2670)
p3dn.24xlarge (Intel Xeon Platinum 8175)
r5d.8xlarge (Intel Xeon Platinum 8175)
r5d.16xlarge (Intel Xeon Platinum 8175)
r5d.metal (Intel Xeon Platinum 8175)
r5.8xlarge (Intel Xeon Platinum 8175)
r5.16xlarge (Intel Xeon Platinum 8175)
r5.metal (Intel Xeon Platinum 8175)
m2.xlarge (Intel Xeon Family)
m2.2xlarge (Intel Xeon Family)
m2.4xlarge (Intel Xeon Family)
z1d.metal (Intel Xeon Platinum 8151)
i3en.large (Intel Xeon Platinum 8175)
i3en.xlarge (Intel Xeon Platinum 8175)
i3en.2xlarge (Intel Xeon Platinum 8175)
i3en.3xlarge (Intel Xeon Platinum 8175)
i3en.6xlarge (Intel Xeon Platinum 8175)
i3en.12xlarge (Intel Xeon Platinum 8175)
i3en.24xlarge (Intel Xeon Platinum 8175)
i3en.metal (Intel Xeon Platinum 8175)

It seems like you need to specify which instance types your AWS Marketplace image also known as AMI is compatible with, for the time being it's not possible to select any of the instance types I've listed above.

It's worth to note that the AWS Marketplace AMI is a bit out of date as well.

ahkok commented 5 years ago

This is a known issue where our images just won't properly boot on these larger systems. I fail to find the issue right now, but we've disabled them as being installable for this very reason.

ahkok commented 5 years ago

Edited title to reflect AWS, since, critical part of this issue.

GrabbenD commented 5 years ago

@ahkok Good catch. Troubleshooting boot issues in AWS can be troublesome since they don't offer VNC or xterm console access but on the other hand it's possible to view dmesg logs through EC2 > {instance} > Actions > Instance Settings > Get System Log but for that I believe we'd need to enable access for non-root users. I hope this can be of help.

ahkok commented 5 years ago

https://github.com/clearlinux/distribution/issues/1078

We'll have system logs fixed in a few days (requires kernel update on AWS instance).

ahkok commented 5 years ago

system logs use the serial console, it has nothing to do with dmesg being restricted :)

GrabbenD commented 5 years ago

That's useful to know, thanks @ahkok.

There seems to be only a few AMIs that can currently boot the G4 instance type (i.e Ubuntu 19.04 and AWS Linux 2). For the time being, is there any workaround I can apply to run Clear Linux on this instance type or is this issue deeply related to the processor type in AWS @miguelinux?

gtkramer commented 5 years ago

I just booted a g4dn.2xlarge instance in us-west-2 with an Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz CPU, and it booted all right and I could log in. @xRiot , maybe try again? We've updated our base offering in the marketplace. @ahkok

GrabbenD commented 4 years ago

Thanks a lot @gtkramer and @ahkok. Here is a up to date list of the remaining unsupported instance types which would be worth investigating:

1 - Instance types which contain hardware/specifications that are otherwise supported in different instance types:

m5ad.8xlarge
m5ad.16xlarge
r5ad.8xlarge
r5ad.16xlarge

2 - Previous generation Intel Xeon Family instances:

m1.small
m1.medium
m1.large
m1.xlarge
c1.medium
c1.xlarge
m2.xlarge
m2.2xlarge
m2.4xlarge

3 - Furthermore instances with AWS Graviton processor doesn't seem to be working either:

a1.medium
a1.large
a1.xlarge
a1.2xlarge
a1.4xlarge
a1.metal

4 - Additionally t1.micro is unsupported but it could be due to unsupported hardware.

Thanks again for the help with this.

gtkramer commented 4 years ago

@xRiot For the m5ad and r5ad 8x and 16x large instance types, I unfortunately don't see them in the list of instance types to select when modifying our offering in the AWS Marketplace. They also don't appear to be listed on the product pages for m5 and r5?

https://aws.amazon.com/ec2/instance-types/m5/ https://aws.amazon.com/ec2/instance-types/r5/

For the previous generation Intel Xeons, I'm not seeing them exposed as an available instance type either. We can check with AWS support folk to see if this is intentional. @ahkok is this something you think we should look at?

Regarding the a1 instances, we don't have an offering that works on ARM processors. Only x86_64 :)

ahkok commented 4 years ago

We've had issues with some instance types before (wouldn't boot). I have no idea which ones or why, and if it's even still relevant.