I have mostly played in trickster’s table, there the AI frequently overbid (especially bidding “the works,” which is a slalom-style bid). It is often a viable strategy to just wait for the AI to overbid and then punish it for a win.
The defender has a lot of power in choosing a special suit. With only two players, regaining the lead is often hard so choosing which cards can break the lead is crucial.
I feel like “the Works” is bid too often, but I think it is partly because the AI overvalues it. “The Works” is actually a very risky bid, as the bidder doesn’t get to choose a Trump suit. I imagine that two skilled humans would have a more even distribution of bids.
Having the reward function be score based the whole time might bias it towards riskier bids. I'll try modifying the reward function during the bidding phase to be 1 for a win and -1 for a loss and not score-based.
From FateTriarrii on BGG https://boardgamegeek.com/boardgame/365349/hotdog/ratings