Closed mcollardanuy closed 2 years ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
Hi @kasra-hosseini, could you have a look at this PR? These are the main changes:
I will start looking into the issue about the second and third columns for vector generation separately.
Let me know if something is not clear. Thank you!
@mcollardanuy Could you please take a look at the tests here: https://github.com/Living-with-machines/DeezyMatch/blob/feature/109-improve-tokenizer/DeezyMatch/tests/test_utils.py? Are these the expected outputs?
@mcollardanuy I also added another test for https://github.com/Living-with-machines/DeezyMatch/issues/109, see: https://github.com/Living-with-machines/DeezyMatch/blob/feature/109-improve-tokenizer/DeezyMatch/tests/test_pipeline_ngram.py.
The only difference to test_pipeline.py
is that we set tokenize: ["ngram"]
instead of tokenize: ["char", "word"]
If you are happy with https://github.com/Living-with-machines/DeezyMatch/commit/70d4654e004ebcf8cd0d92df32bec8c636061ea5, please go ahead and merge with the develop branch.
Hi @kasra-hosseini, thanks! There's just one case in which I think we're not getting the expected output. And I would add a test using two work token separators. Other than that, everything looks good to me!
@mcollardanuy Great, thanks again. I tried to address your comments. Could you please take a look at them? Is there anything else that you think we should address in this PR?
Hi @kasra-hosseini, thanks, all looks good!
Issues addressed: