cargomedia / cm

UNMAINTAINED - CM web application framework
MIT License
12 stars 18 forks source link

FB-897 Consistently-randomly select 10% of content per site for "verified manually" #2670

Closed fauvel closed 7 years ago

fauvel commented 7 years ago

Something's not working right, at least in the CM box. It seems that the random scores are in the range 0-2^24 (=16M) instead of 0-1. Here the raw results (with print_r) of the ElasticSearch query for the unit test CM_PagingSource_ElasticsearchTest::testSelectRandomSubset:

Array
(
    [took] => 2
    [timed_out] => 
    [_shards] => Array
        (
            [total] => 1
            [successful] => 1
            [failed] => 0
        )

    [hits] => Array
        (
            [total] => 3
            [max_score] => 15872786
            [hits] => Array
                (
                    [0] => Array
                        (
                            [_index] => test_index_1.1499762341
                            [_type] => index_1
                            [_id] => 1
                            [_score] => 15872786
                        )

                    [1] => Array
                        (
                            [_index] => test_index_1.1499762341
                            [_type] => index_1
                            [_id] => 2
                            [_score] => 14659161
                        )

                    [2] => Array
                        (
                            [_index] => test_index_1.1499762341
                            [_type] => index_1
                            [_id] => 3
                            [_score] => 5192267.5
                        )

                )

        )

)

That's why I'm using $this->_minScore = (float) (1 << 24) * (1 - $percentage / 100); in CM_Elasticsearch_Query::selectRandomSubset.

However, when I link this in sk, the random scores are in the range 0-1 as expected! Everything works fine with $this->_minScore = 1 - $percentage / 100; then.

Any idea why it is like this in the CM box?

fauvel commented 7 years ago

Interesting thing, it also works as expected in Travis. Is it just me?!

alexispeter commented 7 years ago

Have you tried using https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-explain.html to reproduce the scores?

fauvel commented 7 years ago

Search results:

{
    "took": 33,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "failed": 0
    },
    "hits": {
        "total": 3,
        "max_score": 14802293,
        "hits": [
            {
                "_index": "test_index_1.1499767344",
                "_type": "index_1",
                "_id": "2",
                "_score": 14802293
            },
            {
                "_index": "test_index_1.1499767344",
                "_type": "index_1",
                "_id": "1",
                "_score": 13373635
            },
            {
                "_index": "test_index_1.1499767344",
                "_type": "index_1",
                "_id": "3",
                "_score": 10431651
            }
        ]
    }
}

Explain:

{
    "_index": "test_index_1.1499767249",
    "_type": "index_1",
    "_id": "1",
    "matched": true,
    "explanation": {
        "value": 0,
        "description": "function score, product of:",
        "details": [
            {
                "value": 0,
                "description": "Math.min of",
                "details": [
                    {
                        "value": 0,
                        "description": "random score function (seed: 4672437580970240501)",
                        "details": [
                            {
                                "value": 1,
                                "description": "ConstantScore(QueryWrapperFilter(ConstantScore(*:*))), product of:",
                                "details": [
                                    {
                                        "value": 1,
                                        "description": "boost"
                                    },
                                    {
                                        "value": 1,
                                        "description": "queryNorm"
                                    }
                                ]
                            }
                        ]
                    },
                    {
                        "value": 3.4028235e+38,
                        "description": "maxBoost"
                    }
                ]
            },
            {
                "value": 1,
                "description": "queryBoost"
            }
        ]
    }
}
alexispeter commented 7 years ago

I cannot reproduce it. Running locally on cm always gives scores between 0 and 1.

alexispeter commented 7 years ago

Try testing the query on production (also to check performance), if scores are in range there I'd say it's ok.

fauvel commented 7 years ago

Tagged 1.256.10