Closed kasra-hosseini closed 4 years ago
@mcollardanuy @fedenanni For testing:
from DeezyMatch import candidate_ranker
# Find candidates
candidates_pd = \
candidate_ranker(scenario="./combined/test/",
query=["mariona", "fede", "kasra"],
ranking_metric="conf",
selection_threshold=0.8,
num_candidates=10,
search_size=1000,
output_filename="test_candidates_deezymatch",
pretrained_model_path="./models/finetuned_test001/finetuned_test001.model",
pretrained_vocab_path="./models/finetuned_test001/finetuned_test001.vocab",
number_test_rows=20)
@fedenanni Thanks for the review. I tried to answer all your comments. Could you please take a look? if you are happy, please mark them as resolved.
Hi @kasra-hosseini, I'm done with the review. Great additions! The on-the-fly alias detection will be super useful. Let me know if you want to discuss anything, especially regarding directory structures, or there's anything I can help with. Thanks again!
Hi @kasra-hosseini, all looks good! 👍
@mcollardanuy Now, we log the function args in log.txt
: https://github.com/Living-with-machines/DeezyMatch/pull/68/commits/414b9f1f13da078fdbcf5a376f16063192cddca1. Could you please take a look?
@mcollardanuy Now, we log the function args in
log.txt
: 414b9f1. Could you please take a look?
That's perfect, thanks!
In this PR, the main contribution is to perform alias detection on-the-fly:
@mcollardanuy @fedenanni Here is the very fist version of alias detection on-the-fly. The design can be improved, but currently, I call
test_tokenize
andtest_model
at the start to generate temporary vectors for a string or a list of query strings (seequery_vector_gen
inutils_candidate_ranker
), then, we combine and use them in candidateRanker (as before). After combining the vectors, I remove the temporary directory.The idea behind this design choice was compatibility with what we already had. There are still some issues/improvements that we should make:
We need to change this so that many temporary query files can be combined, similar to combineVecs function.
query_vector_gen
.