using same main - Githubissues

isspek commented 3 years ago

Hi @bksaini078, I have written the main method for the purpose of comparison the models fairly. See https://github.com/bksaini078/MT_Bert_FakeNews/blob/main/BERT/main.py. Why don#t you use this main function, we can have research reproducible problems, we should have the same workflow for all models. YOu should have added your implementations in the models dictionary as follows:

Let me know why you haven#t added. Then we can find solution. Greetings

bksaini078 commented 3 years ago

Hello @isspek ,

I was thinking the same of including in the main function written by you. But as per my understanding below are the reason of not moving forward with that. 1) For basic bert you are using different data preprocessing as for mean teacher and pi model I am using data after preprocessing and generating fake news that is different folder location. And, also quite different creating model and calling functions. 2) I was not aware of the mentioned research problem, and In order to not effect your code , I updated the the same main function, that I have created from the begining and was quite familier for me.

If you want I can merge or create one main function and resolve the issue. Let me know what need to be done.

Thank you, regards, Bhupender Kumar Saini

isspek commented 3 years ago

Using the same method is also good for flexibility, later you can add more models and reduce the duplicate code lines which makes it easier for other coders.

1) Yes, as said previously, you should use this preprocessing that I am using for mean and pi models. It will not hurt the time since the data size is very small. Also, I am using early stopping, that should be same for mean and pi. For creating model calling function, my implementation is based on keras. I guess yours is tensorflow. Maybe you add the attention model that is not using mean or pi to mitigate. Then we can check the others how we can integrate into bert workflow. I assume that you will override the lines 153-154.

I will work on the code new year.

bksaini078 commented 3 years ago

I made a note of it . I need to change lot of code here. As, for mean teacher and Pi model we also have unlabel data to preprocess which is quite five times of train data which will create time overhead during training.

create_inputs_targets and create_news_example is bert specific preprocessing which returns different dimensional data not suitable for mean teacher and Pi model.
I am not calling model.fit for training, I am training quite differently. But, all code reduction and making it easier for other coder is righnow not high priority. Rightnow, once we get the statisfactory expected result with the present working code(unable to do it because of lack of GPU even Google colab strict myself for user limit crossed) and finding best hyperparameters is also pending. Then, we can work on the code.

please do share your inputs.

thanks

isspek commented 3 years ago

For testing the implementation what I did, I always use a little samples of input, for instance 5 samples in order to understand whether my workflow works. I recommend you to do that than you will not worry about overhead. I quickly checked for custom fit function and found this https://keras.io/guides/customizing_what_happens_in_fit/ I guess this is our solution for the mitigating mean training.

bksaini078 commented 3 years ago

Thank you for the link and testing strategies, can be done like that. I will be doing re coding according to the given link after discussing with you as there are many other points to discuss like what about cross validation ('How are gonna implement this , same like before') and how many model combination we are mentioning in the paper.

isspek commented 3 years ago

We will only evaluate BERT, mean BERT and pi BERT as we had already discussed it at our previous meeting and I already indicated the models as NotImplemented in main.py (e.g the first picture).

bksaini078 commented 3 years ago

Thats why I would like to discuss with you to get better clarity. Its alright because Attention and BERT both are implemented. -Rightnow, having issues with PI model. That we need to discuss. -And regarding paper writing, please mention points or some hint what shall be written (like algo, Diagram,about why , how). Or the structure with sections and subsections.

Please do provide some time to discuss .

Thank you

bksaini078 commented 3 years ago

Initial result with fakeHealth dataset ,

isspek commented 3 years ago

The results seems sth is wrong, see the reference results https://docs.google.com/spreadsheets/d/1x_9d2-gL6y7HIzCFVEx5ktjL4-7OPBwSrqptUOmA62k/edit?usp=sharing

bksaini078 commented 3 years ago

do share the epochs, learning rate, batch size and other parameters , the model ran on.

isspek commented 3 years ago

I used early stopping so there is no epoch, I set up 100 as epochs. It will stop mostly epoch 3 or 4. The learning rate is 2e-5, and the batch size is 1. The model is bert uncase and the rest of the parameters could be seen on the bert scripts. Hope it helps.

bksaini078 commented 3 years ago

In shell script , you have mentioned bert cased as shown in the image. In comment, it is uncase.

isspek commented 3 years ago

As you see clearly from the table, the column "pretrained models" indicate case and uncased bert models. For the case model, also your approach does not yield good results unfortunately, it seems sth really wrong. The scripts "case" model comes from my previous experiments. Because that was the last model I tried. Anyway, I forked the code and will work on the integration of mean bert. There are issues on your implementations when I checked the code. I will probably fix them next week.

bksaini078 commented 3 years ago

Yes I agree, expert review on my code is highly needed. And, please do let me know if any issues I can fix from my end. Thank you

bksaini078 / MT_Bert_FakeNews

using same main #2