Open dgarijo opened 3 years ago
Two different solutions: 1) Provide a confidence value based on the length of the header (longer headers have less confidence 2) If more than two categories are provided, and these are far in meaning, then lower the confidence.
Alternatively, we should explore using language models to retrieve the meaning of the header more accurately
Right now the header analysis gives a 1 whenever a keyword is detected in the title of a header. Although this behaves in general ok, there are some exceptions. Header analysis should return an estimation on how good the fit is in the category. For example "browser issues (FAQ)" will be tagged as "issue" category, and that may be wrong. Long headers may not be very informative.