Living-with-machines / DeezyMatch

A Flexible Deep Learning Approach to Fuzzy String Matching
https://living-with-machines.github.io/DeezyMatch/
Other
134 stars 34 forks source link

Feature/84 inference inputs #112

Closed kasra-hosseini closed 2 years ago

kasra-hosseini commented 2 years ago

This PR closes https://github.com/Living-with-machines/DeezyMatch/issues/84

@mcollardanuy Could you please take a look at this PR?

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

kasra-hosseini commented 2 years ago

@mcollardanuy It seems to me that README does not need to be updated after the changes in this PR. Is that correct? (previously, I thought we have a comment in the README re the "three column" issue, but maybe it was in our LinkedPast presentation?)

mcollardanuy commented 2 years ago

@mcollardanuy It seems to me that README does not need to be updated after the changes in this PR. Is that correct? (previously, I thought we have a comment in the README re the "three column" issue, but maybe it was in our LinkedPast presentation?)

I think it maybe does not need to be updated, but it probably will make Deezy's functionality clearer. I think it would be clearer to have one training dataset (three-column), one test dataset (inference/test, three-column), one candidates dataset (one-column), and one queries dataset, instead of using the same dataset for all steps. What do you think? I can take care of this bit in the documentation (in a different PR maybe)

kasra-hosseini commented 2 years ago

@mcollardanuy It seems to me that README does not need to be updated after the changes in this PR. Is that correct? (previously, I thought we have a comment in the README re the "three column" issue, but maybe it was in our LinkedPast presentation?)

I think it maybe does not need to be updated, but it probably will make Deezy's functionality clearer. I think it would be clearer to have one training dataset (three-column), one test dataset (inference/test, three-column), one candidates dataset (one-column), and one queries dataset, instead of using the same dataset for all steps. What do you think? I can take care of this bit in the documentation (in a different PR maybe)

@mcollardanuy That would be great (and much cleaner than what we have now). Yesterday, when I was looking at the notebooks again, I thought the same that we have to separate training/inference/query/candidate files in our README and tests. It would be really good if you could take care of this in another PR.

mcollardanuy commented 2 years ago

@mcollardanuy It seems to me that README does not need to be updated after the changes in this PR. Is that correct? (previously, I thought we have a comment in the README re the "three column" issue, but maybe it was in our LinkedPast presentation?)

I think it maybe does not need to be updated, but it probably will make Deezy's functionality clearer. I think it would be clearer to have one training dataset (three-column), one test dataset (inference/test, three-column), one candidates dataset (one-column), and one queries dataset, instead of using the same dataset for all steps. What do you think? I can take care of this bit in the documentation (in a different PR maybe)

@mcollardanuy That would be great (and much cleaner than what we have now). Yesterday, when I was looking at the notebooks again, I thought the same that we have to separate training/inference/query/candidate files in our README and tests. It would be really good if you could take care of this in another PR.

Great! I've created a new issue for this, I'll take care once we've merged this: https://github.com/Living-with-machines/DeezyMatch/issues/113

kasra-hosseini commented 2 years ago

@mcollardanuy You may have seen my last commit. Is there anything else that we should address before merge? And I believe all the points raised in this issue https://github.com/Living-with-machines/DeezyMatch/issues/109 are also addressed. Is that correct?

mcollardanuy commented 2 years ago

Hi @kasra-hosseini, thank you! Everything looks good to me, and I think all points in #109 have been addressed.

For your information, I am now working on improving the documentation/tutorials as part of https://github.com/Living-with-machines/DeezyMatch/issues/113. Once we have different datasets for the different steps, I will also add the corresponding tests.

kasra-hosseini commented 2 years ago

Hi @mcollardanuy, great, thanks again for the changes/reviews. I merge this PR to develop now and will release a new version (v1.3.0) today.