Closed Joeran closed 10 years ago
select A.weighting_scheme, avg(S.computation_time), avg(U.execution_time) from recommendations_documents_set S
JOIN user_models U ON (S.user_model_id = U.id)
JOIN algorithms A ON (A.id = U.algorithm_id)
WHERE S.created > '2014-01-01' AND A.approach<>2
GROUP BY A.weighting_scheme
very strange: lucene query time is much worse for TF-only user models if calculated since 2013-08-01:
select A.weighting_scheme, round(avg(S.computation_time)) AS lucene_query_time, round(avg(U.execution_time)) AS model_creation_time from recommendations_documents_set S
JOIN user_models U ON (S.user_model_id = U.id)
JOIN algorithms A ON (A.id = U.algorithm_id)
WHERE S.created > '2013-08-01' AND A.approach<>2
GROUP BY A.weighting_scheme
for Term-only recommendations user model creation differs more:
update!: just a coincident:
without feature boosting time for models with the same size, time is about equal:
select A.weighting_scheme, round(avg(S.computation_time)) AS lucene_query_time, round(avg(U.execution_time)) AS model_creation_time from recommendations_documents_set S
JOIN user_models U ON (S.user_model_id = U.id)
JOIN algorithms A ON (A.id = U.algorithm_id)
WHERE S.created > '2013-08-01' AND A.approach<>2
AND A.data_element_type=1
AND A.data_element_type_weighting='1'
AND A.feature_weight_submission=0
AND U.feature_count_reduced_unique between 590 AND 600
AND (A.default_algorithm <> 1 OR A.default_algorithm IS NULL)
GROUP BY A.weighting_scheme
with boosting there is a huge difference:
:) :) difference between lucene query time of TF vs TF-IDF generated terms is simply that TF favored terms are generally found in many more documents than terms favored by TF-IDF
--> with TF much more results need to be rated and merged by lucene for every term
regarding 2nd question:
created recommendations for MM with TF (weighting_scheme 1) vs TF-IDF (weighting_scheme 2)
select A.weighting_scheme, count(*) AS count
from recommendations_documents_set S
JOIN user_models U ON (S.user_model_id = U.id)
JOIN algorithms A ON (U.algorithm_id = A.id)
WHERE S.created BETWEEN '2013-04-01' AND '2013-05-01'
AND A.data_element=1
AND A.default_algorithm IS NULL
GROUP BY A.weighting_scheme
Apr 2013
Aug 2013
Oct 2013
Feb 2014
Dec 2013
ich glaube, da haben wir uns missverstanden. probiere mal das hier:
select S.id, A.weighting_scheme, A.weight_idf, S.created from recommendations_documents_set S JOIN user_models U ON (S.user_model_id = U.id) JOIN algorithms A ON (U.algorithm_id = A.id) WHERE S.created BETWEEN '2013-02-01' AND '2013-12-01' AND A.data_element_type=2 AND A.weighting_scheme=2 AND A.weight_idf=1 ORDER BY S.created
seit 15.5. gibt es keine empfehlungen mehr mit diesen einstellungen.
übrigens, wenn du "AND A.weight_idf=1" änderst zu "AND A.weight_idf=2" werden deutlich mehr Empfehlungen angezeigt. Ich bin mir nicht sicher ob das seine Richtigkeit hat, dass auch bereits vor 15.5. deutlich mehr Empfehlungen mit weight_idf=2 statt weight_idf=1 angezeigt wurden.
zu: "dass auch bereits vor 15.5. deutlich mehr Empfehlungen mit weight_idf=2 statt weight_idf=1 angezeigt wurden." das passt: 1/3: TF 1/3: TF-IDF auf mindmaps 1/3: TF-IDF auf texten
den rest prüfe ich noch
zur Anfrage, die seit 15.05. keine Ergebnisse liefert:
ich habe das explizit so gesetzt, dass bei weighting-scheme 2 und data_element_type != 1 auch weight_idf auf 2 gesetzt wird.:
//citations are used
if (alg.getDataElementType() != 1) {
s += "," + (r.nextInt(1000)+1);
// IDF for citations needs to be based on fulltexts
if (alg.getWeightingScheme() == 2) {
alg.setWeightIDF(2);
}
}
alg.setDataElementTypeWeighting(s);
falls das nicht soll, kann ich es entfernen, soll ich?
ja, entferne das bitte wieder, zumindest sofern es funktioniert. dein kommentar "// IDF for citations needs to be based on fulltexts" hört sich für mich so an, als müsste das zwingend so sein .
wie per telefon besprochen:
This chart shows CTRs and runtimes for weighting terms and citations with TF, TF-IDF (based on the PDF corpus), and TF-IDF (based on user's mind-maps). All data is since August 2013 and later.