Closed WingCode closed 1 year ago
Hi @WingCode, thanks for your submission, this PR looks really good, and certainly solves the problem of replacing unknown words in a single diagram. While not explicitly mentioned in the task (apologies!), it would be really useful to have a HandleUnknownWords
module, that could handle some of the pre-processing necessary to use this rewrite rule on a dataset. In broad strokes, the functionality of the class might looks something like this:
train
flag (indicating that this data is to be used for training), this module would take a min_freq
minimum frequency parameter, and replace all words that appear less frequently than this in the data with the UNK
token.test
or dev
flag (indicating that this data is to be used for evaluation), along with a list of words that appeared in the training dataset (which could potentially be returned by the module when train
is passed), replace any words in these diagrams that do not appear in the training data with the UNK
token.I appreciate that this is asking you to do extra work, but if you think you could add that it would be really great!
@le-big-mac Thank you for going through the PR and describing the functionalities of HandleUnknownWords
.
Would you like it to be added into this PR or a new separate PR?
In my opinion, separate PR would be better since the UnknownWordsRewriteRule
implemented in this PR can work independently of HandleUnknownWords
and it would be easier to review changes pertaining to HandleUnknownWords
in a new PR.
Hi @WingCode, I have taken another look and left a couple of small comments that need to be resolved in this PR. I agree with you that HandleUnknownWords
should probably be done in a separate PR, and we would really appreciate your help with that!
Considering the initial framing of the question, I feel that once the comments in this PR are addressed we can assign you the issue, and then once the HandleUnknownWords
PR is opened that issue can be closed.
@WingCode We have fixed an error in main which makes a test to fail, please merge and see if the tests are now passing for your PR.
@WingCode Please address any remaining comments by tomorrow 13/6, last day of the hackathon.
@dimkart @WingCode unitaryHACK will accept minor reviews of open PRs even if they are made shortly after today's deadline.
Superseded by #105.
closes #84