jakestanton2016 commented 2 months ago

Prerequisite Steps:

1. Make sure your provider has community provider attributes and your contact details (email, website):

  Example:
  $ provider-services query provider get akash1<REDACTED> -o text
  ...
  attributes:
  ...
  - key: host
  value: akash
  - key: tier
  value: community
  info:
    email: "<your email>"
    website: "<your website>"

Ref documentation:.

2. Make sure your provider *.ingress resolves to your provider IP (ideally worker node IP)

host <anything>.ingress.<yourdomain>

Example:

$ host anything.ingress.akash.pro
anything.ingress.akash.pro is an alias for nodes.akash.pro.
nodes.akash.pro has address 65.108.6.185

More info on what DNS records should look like are available in the Akash Documentation here.

3. Please make sure your Akash provider doesn't block any Akash specific ports.

If you are using a firewall, please follow this doc: https://akash.network/docs/providers/build-a-cloud-provider/akash-cloud-provider-build-with-helm-charts/#step-11---firewall-rule-review
If you are behind NAT, then you need to make sure these ports are open between your provider worker nodes and the internet.

Audit Steps:

1. Title the issue: " [Provider Audit]: Provider Address" (e.g. "[Provider Audit]: provider.europlots.com")

2. Wait for response via comments. If no issues during provider Audit, process will be complete, provider should start bidding on leases, and Audit ticket will be closed.

3. If there are issues during the provider Audit, debug those issues, and Audit will be complete.

4. Audit Issue will be closed by core team member.

Leave contact information (optional)

Name - Jake
Discord handle or Telegram handle - Stanton2495
Contact email address -jakeminer2021@gmail.com

shimpa1 commented 2 months ago

Good morning,

provider address: akash16q46mn8tm7vtwn4h9rugr8ptch6shp2qda7y89
provider URI name resolution works: yes
provider ingress URI name resolution works: yes
provider status endpoint works: yes
provider GRPC status endpoint works: yes
provider attributes set correctly: yes
provider responding to orders: yes
test deployment works: yes (without GPU)
network functional: yes

Other remarks:

The provider seems low on resources (CPU and RAM). Highly suggest using server hardware with much more resources. Suggestion: 32 cores, 128GB RAM or more.
Provider storage is a RAID0 array with uneven drives:

sda 8:0 0 111.8G 0 disk ├─sda1 8:1 0 1G 0 part └─sda2 8:2 0 110.7G 0 part nvme1n1 259:0 0 953.9G 0 disk └─md0 9:0 0 1.4T 0 raid0 /etc/resolv.conf /etc/hostname /dev/termination-log /etc/hosts nvme0n1 259:1 0 465.8G 0 disk └─md0 9:0 0 1.4T 0 raid0 /etc/resolv.conf /etc/hostname /dev/termination-log /etc/hosts

This is bad practice on multiple levels: 1. RAID0 generally decreases stability by adding yet another SPOF. 2. RAID0 needs equally sized drives to work properly.

The provider lists RTX 3090 which seems to be allocated on node2 however it's not available for lease. The GPU is also not currently leased:

curl -sk https://provider.gpu3090.ddns.net:8443/status | jq { "cluster": { "leases": 3, "inventory": { "active": [ { "cpu": 100, "gpu": 0, "memory": 100663296, "storage_ephemeral": 68157440 }, { "cpu": 100, "gpu": 0, "memory": 268435456, "storage_ephemeral": 268435456 }, { "cpu": 100, "gpu": 0, "memory": 100663296, "storage_ephemeral": 6291456 } ], "available": { "nodes": [ { "name": "node1", "allocatable": { "cpu": 4000, "gpu": 0, "memory": 12308656128, "storage_ephemeral": 224812917593 }, "available": { "cpu": 3080, "gpu": 0, "memory": 12087369728, "storage_ephemeral": 224812917593 } }, { "name": "node2", "allocatable": { "cpu": 8000, "gpu": 1, "memory": 8126627840, "storage_ephemeral": 1349073758432 }, "available": { "cpu": 5195, "gpu": 0, "memory": 3229161472, "storage_ephemeral": 1348194003168 } } ] } } }, "bidengine": { "orders": 0 }, "manifest": { "deployments": 0 }, "cluster_public_hostname": "provider.gpu3090.ddns.net", "address": "akash16q46mn8tm7vtwn4h9rugr8ptch6shp2qda7y89" }

This suggests that there's an issue either with the nVidia drivers, nVidia toolkit, or K8S plugin : Please refer to https://akash.network/docs/providers/build-a-cloud-provider/gpu-resource-enablement/

Please fix these issues and we can move on.

Shimpa

jakestanton2016 commented 2 months ago

Thanks… working on fixes

jakestanton2016 commented 1 month ago

I have made the fixes suggested. Lost on-time performance due to downtime. Please re-evaluate. Thank you.

shimpa1 commented 1 month ago

The provider is not answering to orders that include a GPU, even though a GPU is present in the inventory and available.

Requested resources: 1 GPU nVidia rtx3090 1 CPU 2 Gi RAM 2 Gi storage

Please make sure your provider is fully functional.

thanks. Shimpa

andy108369 commented 1 month ago

Provider is still offline and it's been quite long time. Please feel free to reopen if needed.

akash-network / community

[Provider Audit]: gpu3090.ddns.net #681