Open Benja1972 opened 3 years ago
Hi Sergei,
Thanks a lot for your question! So the best option would be to run the following:
bunzip2 filename.links.bz2
python3 -m danker filename.links 0.85 40 0.1 -i | sed "s/\(.*\)/Q\1/" > output.rank
sort -k 2,2nr -T . -S 50% -o output.rank output.rank
You would need to make sure that the machine has enough main memory available this time.
An alternative would be the following:
bunzip2 filename.links.bz2
sort -k 2,2n -T . -S 50% -o filename.links.right filename.links
python3 -m danker filename.links -r filename.links.right 0.85 40 0.1 -i | sed "s/\(.*\)/Q\1/" > output.rank
sort -k 2,2nr -T . -S 50% -o output.rank output.rank
This takes a bit longer but the memory footprint should be less than 8GB.
Let me know which option worked for you!
Hi Andreas, Thank you for your answer. I will try it out. Would be nice to have a predefined bash script which does same for any output of link collector just in case .
Best Sergei
Hmm, let me think on the best option how to separate this out form the workflow script... https://github.com/athalhammer/danker/blob/20cc2b7b1fe5d937ea5204d214a074baf3400c93/script/dank.sh#L106
Thank you @athalhammer ! It works for me. I have tested codes of lines you provide early. Sergei
So I thought about it and I came to the conclusion that wrapping these three lines in a designated script would be overkill.
Hello, Thank you for nice tool. I have one question about how to run danker on links file which already downloaded and processed by danker?
I run "./danker.sh ALL --bigmem" and after few hours it was crushed with memory issue but bziped file of links were created. How I can reuse this file to calculate only PageRank?
Thank you! Sergei