Open romichg opened 8 years ago
It seems like we somewhere don't count all the articles. The new data I loaded shows 2128 articles, however there are actually 2155 it should be showing.
I traced it and it seems like a problem with the count. If you go: http://localhost:12346/data/query_tree/query-6234f2d47eb7 the response says {"id":"query-6234f2d47eb7","title":"Index","size":2128 Which means 2128 articles.
however if you actually count the number of unique articles it returns it 2155. This is how I count the actual articles:
wget 'http://localhost:12346/data/query_tree/query-6234f2d47eb7' cat query-6234f2d47eb7 | sed 's/}/\n/g' | sed 's/,/\n/g' | grep article- | awk -F- '{print $2}' | awk -F\" '{print $1}' | sort | uniq | wc -l 2155
It seems like we somewhere don't count all the articles. The new data I loaded shows 2128 articles, however there are actually 2155 it should be showing.
I traced it and it seems like a problem with the count. If you go:
http://localhost:12346/data/query_tree/query-6234f2d47eb7 the response says {"id":"query-6234f2d47eb7","title":"Index","size":2128
Which means 2128 articles.
however if you actually count the number of unique articles it returns it 2155. This is how I count the actual articles:
wget 'http://localhost:12346/data/query_tree/query-6234f2d47eb7' cat query-6234f2d47eb7 | sed 's/}/\n/g' | sed 's/,/\n/g' | grep article- | awk -F- '{print $2}' | awk -F\" '{print $1}' | sort | uniq | wc -l 2155