PHP-Science / PageRank

:elephant: PageRank implementation in PHP with extendable features (PHP 7.4)
https://php.science/pagerank/
MIT License
7 stars 1 forks source link

How to use this package to rank Web pages ??? #1

Open chegmarco1989 opened 2 years ago

chegmarco1989 commented 2 years ago

Hi.

We want to know:

1 - How to use it in the case of ranking of web pages ??? What can we insert into the $datasource variable to successfully classify our web page ??? Should we just put the list of urls as datasource ???

2 - Is this package capable of classifying a large or large database of the order of millions or even billions of data ???

Thank you for informing us please.

DavidBelicza commented 2 years ago

Hi @chegmarco1989

The data source is a nested array of integers or a graph if you like. These integers represent the IDs of the entities that being page ranked.

Btw, pagerank is not definitely for ranking webpages. It can rank entities if the relationship between these entities is known. It is called to "Page rank" because the name of the inventor is "Larry Page".

The functional tests shows the usage: https://github.com/PHP-Science/PageRank/blob/master/tests/functional/Service/PageRankAlgorithmTest.php#L86

The array contains a list of entities with their IDs. And it also contains the incoming and outgoing connections.

This method shows how to build up the object and where to put the data source: createPageRankAlgorithm And the method testRun shows the usage.

Too many entities or too high iteration number will consume more time to execute the algorithm. I believe, in real world, a pagerank algorithm runs in parallel in smaller topics - sometimes for weeks. (Also the optimised search algorithms weren't as efficient as the PHP builtin search algorithms.)

chegmarco1989 commented 2 years ago

Thank you very much @DavidBelicza for your answer.

But what do you mean by "Also the optimised search algorithms weren't as efficient as the PHP builtin search algorithms" ???

Can you give us some examples of native PHP algorithms that can deliver results more efficiently than "PageRank" ???

Thank you for responding to us please.