UNK tokenizer class - Githubissues

ACE07-Sev commented 1 year ago

closes #84.

To use, we have to :

words= ['photo', 'take'] rewriter = Rewriter([ UNKRewriteRule(words = words)]) tokenized_diagram= rewriter(diagram)

With words = ['photo', 'take'] we have :

ACE07-Sev commented 1 year ago

@le-big-mac dear, I made the handle module you asked in the other PR. Please kindly review to see if it's what you requested.

ACE07-Sev commented 1 year ago

Greetings Charlie,

Hope you are well. Thank you so much for the kind review and feedback! Ooh yes the syntax error I forgot to add the [] after I resolved it in my notebook. I also removed the redundant match check condition for rewrite. I will now focus on the handler.

Can you kindly elaborate a bit more about what was confusing in the current implementation? I took consideration to your comment regarding having a flag (status) that says what the unknown words will be. I thought it would be easier to have both in one method, and just depending on which you wish to apply, call the function, pass your dev or test dataset and set the train_status to false.

P.S : I'll add a .py to do tests too in a bit.

ACE07-Sev commented 1 year ago

@le-big-mac ?

ACE07-Sev commented 1 year ago

@dimkart dear, may I ask if you could kindly guide me on how I can run the checks/tests here?

ACE07-Sev commented 1 year ago

Also, I would like to ask if you could kindly explain what I should amend for the HandleUNKWords module.

dimkart commented 1 year ago

@ACE07-Sev Hi, only maintainers can run tests online (will do it in a sec). @le-big-mac will respond to other questions you have soon.

dimkart commented 1 year ago

@ACE07-Sev Of course you can run the tests locally though.

le-big-mac commented 1 year ago

Hi @ACE07-Sev, apologies for the delay. Before we can accept this PR we will need you to add tests to ensure all your code is functioning correctly.

In terms of the problems with your HandleUnknownWords, a lot of the arguments have a default type of None and only a given subset of the arguments are used depending on the value of train_status. This is confusing and not particularly good code style.

A better way to do it might be the following:

HandleUnknownWords should be a class.
Within this class should be methods for handling unknown words in both train and test data
For the train data, the arguments should include min_freq.
When the train data method is executed, the vocabulary of words that appear more frequently than min_freq could be stored as a member variable of HandleUnknownWords.
The test data method can then use this vocabulary to correctly handle unknown words in test and dev datasets.
If the test data method is called before the vocabulary is instantiated, an error with a helpful message should be thrown.

ACE07-Sev commented 1 year ago

@dimkart dear, how can I run the tests locally? I am testing it like a novice I think hehe. I just use a notebook to make sure it works like it's supposed to. Hence the diagrams I sent before.

How can I make sure it passes the tests you run online locally?

ACE07-Sev commented 1 year ago

@le-big-mac dear, thank you so much for the kind explanation! I added test for unkrewriterule (that's what I assume you meant as tests), and fixed the handler based on your description.

dimkart commented 1 year ago

how can I run the tests locally?

Just run pytest from within your local lambeq repo. You might have to exclude a couple of files, see the build_test action for which ones. Alternatively, just run pytest <specific_test_file.py>.

ACE07-Sev commented 1 year ago

What is the new error exactly? I don't understand the doc too long error.

dimkart commented 1 year ago

What is the new error exactly? I don't understand the doc too long error.

There are a number of linting errors that you need to fix, check the tests and look for linter's messages, they are quite self-explanatory. E.g.

lambeq/rewrite/base.py:492:1: W293 blank line contains whitespace

dimkart commented 1 year ago

@ACE07-Sev It would be great if you could make the tests work by tomorrow 13/6, which is the last day of the hackathon.

ACE07-Sev commented 1 year ago

Understood, will do.

ACE07-Sev commented 1 year ago

All errors fixed!

ACE07-Sev commented 1 year ago

May I later also add a bit more to the HandleUnknownWords class? Within 2 hours or so. Just adding a bit more into the features of the class, i.e., handling single diagram VS list of diagrams automatically.

le-big-mac commented 1 year ago

I think you have until the end of the 13th of June anywhere on earth to add to this PR, so please go ahead.

ACE07-Sev commented 1 year ago

Greetings,

May I ask if I could get a test run here? I can't run my tests in my repo anymore because of this :

GitHub Actions is currently disabled for your account. Please reach out to GitHub Support for assistance.

ACE07-Sev commented 1 year ago

I was able to only check for all except for the build_and_test when it was running test notebooks.

ACE07-Sev commented 1 year ago

Can I get a test to be run here please?

dimkart commented 1 year ago

Can I get a test to be run here please?

Github doesn't let us, since your last tests appear as "queued" for the last 17 hours; when we try to cancel we get the message "Failed to cancel workflow". Perhaps you can try to see if there is anything you can do from your side?

Thommy257 commented 1 year ago

Hi @ACE07-Sev, I've sent an email to the github support. I'll let you know when this is fixed.

ACE07-Sev commented 1 year ago

Greetings dear @dimkart and @Thommy257 ,

Hope you are well. Thank you so much for the heads up. I cancelled the queued workflow from my side. However, I can't run anything so I hope at least cancelling the other one helps.

Thank you so much dear Thommy, I'll be eagerly looking forward to the update!

dimkart commented 1 year ago

Greetings dear @dimkart and @Thommy257 ,

Hope you are well. Thank you so much for the heads up. I cancelled the queued workflow from my side. However, I can't run anything so I hope at least cancelling the other one helps.

Thank you so much dear Thommy, I'll be eagerly looking forward to the update!

@ACE07-Sev There is another one currently queued, can you try to cancel that one as well?

ACE07-Sev commented 1 year ago

Of course, one moment...

ACE07-Sev commented 1 year ago

Thommy dear, I don't have any actions in queue, nor in progress, or neutral, or pending.

ACE07-Sev commented 1 year ago

If it's causing too much hindrance, I can ask a friend to fork my repo and make a PR here on my behalf, or I can make a new account, and fork it and make a new PR.

CQCL / lambeq

UNK tokenizer class #96