Closed revolunet closed 11 months ago
🎉 Deployment for commit 4a09f90da32149700a0c803674eb12e4fd8d447f :
Cool, merci!
Est-ce que tu penses que ce serait possible d'avoir un ordre de grandeur pour le score ? Idéalement l'avoir entre 0 et 1 (pour savoir si un score est bon "en absolu"), ou alors avoir le maxScore
à côté ?
C'est pas trivial de modifier le score en fait; il est calculé en fonction de la query et n'est pas normalisé sur [0,1]
:/
Le maxScore
c'est celui du 1er item de la liste si je me trompe pas
Un "explain" d'exemple pour le calcul du score :
{
"value": 21.91652,
"description": "sum of:",
"details": [
{
"value": 17.90895,
"description": "sum of:",
"details": [
{
"value": 9.977191,
"description": "weight(namingMain:michelin in 9444530) [PerFieldSimilarity], result of:",
"details": [
{
"value": 9.977191,
"description": "score(freq=1.0), computed as boost * idf * tf from:",
"details": [
{
"value": 2.2,
"description": "boost",
"details": []
},
{
"value": 9.977191,
"description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details": [
{
"value": 1515,
"description": "n, number of documents containing term",
"details": []
},
{
"value": 32628324,
"description": "N, total number of documents with field",
"details": []
}
]
},
{
"value": 0.45454544,
"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details": [
{
"value": 1,
"description": "freq, occurrences of term within document",
"details": []
},
{
"value": 1.2,
"description": "k1, term saturation parameter",
"details": []
},
{
"value": 0,
"description": "b, length normalization parameter",
"details": []
},
{
"value": 4,
"description": "dl, length of field",
"details": []
},
{
"value": 1.7728646,
"description": "avgdl, average length of field",
"details": []
}
]
}
]
}
]
},
{
"value": 7.931761,
"description": "weight(naming:michelin in 9444530) [PerFieldSimilarity], result of:",
"details": [
{
"value": 7.931761,
"description": "score(freq=1.0), computed as boost * idf * tf from:",
"details": [
{
"value": 2.2,
"description": "boost",
"details": []
},
{
"value": 7.9317613,
"description": "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
"details": [
{
"value": 11722,
"description": "n, number of documents containing term",
"details": []
},
{
"value": 32639261,
"description": "N, total number of documents with field",
"details": []
}
]
},
{
"value": 0.45454544,
"description": "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
"details": [
{
"value": 1,
"description": "freq, occurrences of term within document",
"details": []
},
{
"value": 1.2,
"description": "k1, term saturation parameter",
"details": []
},
{
"value": 0,
"description": "b, length normalization parameter",
"details": []
},
{
"value": 4,
"description": "dl, length of field",
"details": []
},
{
"value": 2.6909916,
"description": "avgdl, average length of field",
"details": []
}
]
}
]
}
]
}
]
},
{
"value": 3.945307,
"description": "Saturation function on the _feature field for the etablissements feature, computed as w * S / (S + k) from:",
"details": [
{
"value": 4,
"description": "w, weight of this function",
"details": []
},
{
"value": 1.84375,
"description": "k, pivot feature value that would give a score contribution equal to w/2",
"details": []
},
{
"value": 133,
"description": "S, feature value",
"details": []
}
]
},
{
"value": 0.062262263,
"description": "Saturation function on the _feature field for the siretRank feature, computed as w * S / (S + k) from:",
"details": [
{
"value": 0.1,
"description": "w, weight of this function",
"details": []
},
{
"value": 51814485000000,
"description": "k, pivot feature value that would give a score contribution equal to w/2",
"details": []
},
{
"value": 85487029000000,
"description": "S, feature value",
"details": []
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "etatAdministratifUniteLegale:A",
"details": []
}
]
},
{
"value": 0,
"description": "match on required clause, product of:",
"details": [
{
"value": 0,
"description": "# clause",
"details": []
},
{
"value": 1,
"description": "etatAdministratifEtablissement:A",
"details": []
}
]
}
]
}
Est-ce qu'il y aurait pas moyen d'ajouter à la volée un score de comparaison entre chaque résultat trouvé et la chaîne cherchée ? Genre avec une comparaison levenshtein ou ngrams. Il me semble avoir fait ça dans ma folle jeunesse, mais c'est loin dans ma mémoire. Si tu veux je cherche plus :)
fix #196