Closed raffaem closed 3 years ago
Yeah, the weat return as follows:
{'query_name': [MY QUERY NAME], 'result': nan, 'weat': nan, 'effect_size': nan}
Is it possible to know why it is not returning a result?
Hello
Based on what you are describing (that the query returns values in some models and not in others) I could infer that the problem lies in that when transforming the query word sets to embeddings sets there is (at least) one word set that is losing 20% of its words. In this case, WEFE by default invalidates the query making it return None. This could be because the model you are using does not have words in capital letters, does not have words with accents or the words do not exist in its vocabulary.
The behavior of queries invalidated by missing many words is detailed in the warning of this subsection: https://wefe.readthedocs.io/en/latest/user_guide.html#word-preprocessors
You can use the parameter warn_not_found_words=True
to see which words are being lost when converting the query to embeddings.
wefemodel = WordEmbeddingModel(wv, model_name)
query = Query(target_sets, attribute_sets, target_sets_names, attribute_sets_names)
result_weat = weat.run_query(
query, wefemodel, calculate_p_value=True, warn_not_found_words=True,
)
A possible solution would be to use a word preprocessor (specified in the run_query parameter preprocessor_args or secondary_preprocessor_args
).
wefemodel = WordEmbeddingModel(wv, model_name)
query = Query(target_sets, attribute_sets, target_sets_names, attribute_sets_names)
result_weat = weat.run_query(
query,
wefemodel,
calculate_p_value=True,
secondary_preprocessor_args={"lowercase": True, "strip_accents": True},
warn_not_found_words=True,
)
In practical terms, with this parameter you specify to run_query
that for each word o each set, first look for its original version in the model vocabulary and in case it does not find them, preprocess the word (lowercase and without accents) and try again this search.
Pablo.
Hello,
Thank you for your support and your prompt and detailed answer.
I'm making sure that all the words of the word sets are present in the embedding before running the query. So I don't think that's the problem.
Anyway I think WEFE should throw an exception by default instead of returning nothing.
I will try again next week.
Thank you again
I use this code to compute a WEAT:
But sometimes the returning
result_weat
does not include ap_value
key:I think it depends on the model.
For some of my models the returning dictionary do not include this key.
Is it possible it depends on the model?