If you can not reproduce the results, a potential issue is the layers are slightly different, which is a small change caused by supporting more features. We conducted all the experiments using Linear but the code is MergedLinear now. Although they are equivalent in math, the implementation in Torch is different, and our hyperparameters are chosen in Linear setting, which may not be good for MergedLinear. So a simple way is to change MergedLinear into Linear. You can also find the proper hyperparameters using MergedLinear. We plan to find the proper hyperparameters using MergedLinear and update the code.
If you can not reproduce the results, a potential issue is the layers are slightly different, which is a small change caused by supporting more features. We conducted all the experiments using Linear but the code is MergedLinear now. Although they are equivalent in math, the implementation in Torch is different, and our hyperparameters are chosen in Linear setting, which may not be good for MergedLinear. So a simple way is to change MergedLinear into Linear. You can also find the proper hyperparameters using MergedLinear. We plan to find the proper hyperparameters using MergedLinear and update the code.