Closed 88plug closed 1 year ago
This bug persists - today woke up to half of my providers showing exact same error and not bidding. Only fix is to manually restart the provider.
I[2023-01-06|20:34:25.996] order detected module=bidengine-service cmp=provider order=order/akash1q0m0kz83qwpuc5ss39y8sf25mq85a43ffjfd3v/9226782/1/1
I[2023-01-06|20:34:26.004] group fetched module=bidengine-order cmp=provider order=akash1q0m0kz83qwpuc5ss39y8sf25mq85a43ffjfd3v/9226782/1/1
I[2023-01-06|20:34:26.005] requesting reservation module=bidengine-order cmp=provider order=akash1q0m0kz83qwpuc5ss39y8sf25mq85a43ffjfd3v/9226782/1/1
D[2023-01-06|20:34:26.005] reservation requested module=provider-cluster cmp=provider cmp=service cmp=inventory-service order=akash1q0m0kz83qwpuc5ss39y8sf25mq85a43ffjfd3v/9226782/1/1 resources="group_id:<owner:\"akash1q0m0kz83qwpuc5ss39y8sf25mq85a43ffjfd3v\" dseq:9226782 gseq:1 > state:open group_spec:<name:\"akash\" requirements:<signed_by:<> > resources:<resources:<cpu:<units:<val:\"1000\" > > memory:<quantity:<val:\"2147483648\" > > storage:<name:\"default\" quantity:<val:\"1073741824\" > > endpoints:<> > count:1 price:<denom:\"uakt\" amount:\"10000000000000000000000\" > > > created_at:9226784 "
D[2023-01-06|20:34:26.005] reservation count module=provider-cluster cmp=provider cmp=service cmp=inventory-service cnt=1
I[2023-01-06|20:34:26.005] Reservation fulfilled module=bidengine-order cmp=provider order=akash1q0m0kz83qwpuc5ss39y8sf25mq85a43ffjfd3v/9226782/1/1
D[2023-01-06|20:34:26.827] submitting fulfillment module=bidengine-order cmp=provider order=akash1q0m0kz83qwpuc5ss39y8sf25mq85a43ffjfd3v/9226782/1/1 price=34.000000000000000000uakt
Likely related to https://github.com/ovrclk/engineering/issues/673 (internal link)
Can't reproduce this issue nor can see it. I'm deploying nearly on the daily basis and am usually seeing -20 providers bid to my requests. As well as the providers we are managing are at 98% of capacity.
@88plug what's your bid timeout value in the provider? I've noticed your providers aren't expiring the bids after the default 5 mins. Most likely you have set this to a higher value
while your provider is holding on the bids the tenant isn't accepting, it won't bid on the new ones if the "pending" deployments holding up all resources.
Noticed bdl.computer stopped bidding after < 9 hours. Here is the logs when attempting create a new deployment...
The bidengine-order never gets past
submitting fulfillment
and that is where the logs stop formodule=bidengine-order