update notebooks and add intrinsic and extrinsic comparison for replicability checks

sushmaakoju commented 1 year ago

add notebook for intrinsic and extrinsic replicability checks

MihaiSurdeanu commented 1 year ago

@sushmaakoju : please let me know when I should merge this PR.

sushmaakoju commented 1 year ago

@MihaiSurdeanu I think I included all of the unit tests as per the review from you and Keith. Keith did not want to review after. can you please help me with your review of the following two scripts:

MihaiSurdeanu commented 1 year ago

I'll do soon!

On Sun, Jun 25, 2023 at 09:39 Sushma Akoju @.***> wrote:

@MihaiSurdeanu https://github.com/MihaiSurdeanu I think I included all of the unit tests as per the review from you and Keith. Keith did not want to review after. can you please review the following two scripts:

trainer_chap13_classification_deberta https://github.com/clulab/releases/blob/sushma/acl2023-nlrse-sicck/code/training/trainer_chap13_classification_deberta_jun12.ipynb

trainer_chap13_classification_roberta https://github.com/clulab/releases/blob/sushma/acl2023-nlrse-sicck/code/training/trainer_chap13_classification_roberta_jun12.ipynb

— Reply to this email directly, view it on GitHub https://github.com/clulab/releases/pull/20#issuecomment-1605897122, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAI75TUD23I7NLMQHNOEYT3XM7MITANCNFSM6AAAAAAZLBIBYM . You are receiving this because you were mentioned.Message ID: @.***>

MihaiSurdeanu commented 1 year ago

This code looks fine to me, but:

None of these are actual unit tests. Let's discuss in our meeting.
Some small variations in output are normal in deep learning. So I wouldn't check that all labels are equal. I would check that overall performance (say macro F1) is within some acceptable bounds.

sushmaakoju commented 1 year ago

This code looks fine to me, but:

None of these are actual unit tests. Let's discuss in our meeting.

Some small variations in output are normal in deep learning. So I wouldn't check that all labels are equal. I would check that overall performance (say macro F1) is within some acceptable bounds.

@MihaiSurdeanu Sure thank you. Sure I included all of the checks Keith mentioned in the email as assert statements since these were run on colab. I would be glad to get your feedback and implement all of the changes you would suggest.

kwalcock commented 1 year ago

I don't know all the goals of the code, but it led me to reading this interesting page: https://stackoverflow.com/questions/40172281/unit-tests-for-functions-in-a-jupyter-notebook. In addition to the testing issue, another one we come across is how to interface notebooks with github so that we can see the code changes through all the output changes. It would be cool if an experienced Python and Jupyter user wrote up some tips and put them on the clulab wiki or made a sample repo to be used as a template.

sushmaakoju commented 1 year ago

I don't know all the goals of the code, but it led me to reading this interesting page: https://stackoverflow.com/questions/40172281/unit-tests-for-functions-in-a-jupyter-notebook. In addition to the testing issue, another one we come across is how to interface notebooks with GitHub so that we can see the code changes through all the output changes. It would be cool if an experienced Python and Jupyter user wrote up some tips and put them on the Clulab wiki or made a sample repo to be used as a template.

@kwalcock Thank you so much. This is very helpful. Yes, I will let you and Mihai decide who can write/summarize. But I'd volunteer to put our discussion - summary points in one place if you suggest this is helpful. I requested two of the Clulab members back in the Fall of 2022 as they were senior students of Clulab for review, but it did not happen as most students are busy anyway. So Mihai and you are the only two people I generally discuss any best practices and guidelines, in general. I am really very grateful for your timely guidance and review. But I may not be the best person to take the initiative to write this wiki since I like to work a little less towards programming work generally but I would certainly incorporate any of my Colab Notebooks as per the best practices, requirements, and expectations plus guidelines from you and Mihai such that I get a chance to make sure my work is replicable and in alignment with Clulab practices.

sushmaakoju commented 1 year ago

This code looks fine to me, but:

None of these are actual unit tests. Let's discuss in our meeting.

Some small variations in output are normal in deep learning. So I wouldn't check that all labels are equal. I would check that overall performance (say macro F1) is within some acceptable bounds.

@MihaiSurdeanu Thank you so much for the meeting and discussion. The summary of the best practices for including the unit tests for future notebooks, as per your guidelines from today's meeting:

Use Python file and use unit test instead - to run unit tests statically using GitHub Actions on the individual repository.
In the unit tests- don't use Python's assert statement.
Having a professional unit test suite is good practice for a Python file rather than Google Colab or Jupyter Notebook.

As per the guidance, I will upload Python scripts of all my corresponding training notebooks with unit tests.

clulab / releases

update notebooks and add intrinsic and extrinsic comparison for replicability checks #20