Closed pkoloveas closed 5 years ago
You can adjust the threshold using the pageclassifier.yml file
. You just need to set the parameter relevance_threshold
as follows:
type: smile
parameters:
features_file: pageclassifier.features
model_file: pageclassifier.model
relevance_threshold: 0.65
In this example, relevance_threshold
indicates that only pages with score higher 0.65 are considered as relevant.
I updated the documentation page to include this information: https://ache.readthedocs.io/en/latest/page-classifiers.html#pageclassifier-smile. I'm closing this issue, but please let us know or re-open it if there is still any issue.
Thank you, that's exactly what I needed. Also, thank for updating the documentation.
Is there a way to change the accepted threshold for the page classification on the focused crawl or does it entirely depend on the Page Classifier model?
I use the SMILE Classifier at my extracted pages I see some pages with a score of 0.4, classified as relevant. I would like to store only the pages that have a score equal to, or higher than 0.6-0.65. Is there a way to do this through the ache.yml file?
The following are the only parameters that I've set in the ache.yml file regarding the link storage.