confused about the Discriminator design

In the paper "discriminator is 2 layer MLP of 100 hidden units with tanh activation. Our generator consists of separate Actor and Critic neural network and follows the architecture used in [5, 8], where both of these networks have 2 layer MLP of 400 and 300 hidden units with ReLU activation" But in your implementation the hidden units and activation in the network is not designed as you have described in the paper, why? And When computing the discriminator's loss, you use:

log_p = reward + gamma * V_ns - V_s
log_q = lprobs
log_pq_concat = torch.cat([log_p, log_q], 1)
log_pq = torch.logsumexp(torch.cat([log_p, log_q], 1).view(len(state), 2), dim=1).view(-1, 1)

loss2 = F.binary_cross_entropy_with_logits(log_pq_concat, torch.ones(log_pq_concat.size()).to(self.device), reduction='sum')
log_D = log_p - log_pq
D = torch.exp(log_D)
return D, loss2

Why this works? D is the output of disciminator, according to the formula in the paper, I think this should make sense:

log_D = log_p - log_pq
D = torch.exp(log_D)
loss2 = F.binary_cross_entropy_with_logits(D, torch.ones(D.size()).to(self.device), reduction='sum')
return D, loss2

But this doesnot seem to work well.

SaminYeasar / Off_Policy_Adversarial_Inverse_Reinforcement_Learning

confused about the Discriminator design #1