Open Sandr0x00 opened 7 years ago
One of the most teached algorithms is HITS, where you have an "authority" value. Though not strictly a "trustworthyness" value, but might be an indicator for it
Maybe "an indicator for it" is the best we can get. How big would our focused subgraph then be for the HITS algorithm? E.g.: Pretend we want to score the following relationship only: (Beethoven - inspired by - Mozart) -> the sources stored at 'Beethoven' and 'Mozart' by Project A could be the limited root set. However, when we judge a source as a whole, are all sources in the database our root set? Consequently, we would then need to scrape 1 level deeper for the base set?
If we're using an optimized library (NetworkX is pretty good, but that's Python…), with all that power iteration eigenvector calculation magic, we can probably go pretty large with the focused subgraph, maybe even just use all relevant sources. But hard to tell without any measurements
@sacdallago maybe you know a Javascript library similar to NetworkX?
@pfent I unfortunately don't :( But some NPM digging might make pretty things surface. One week ago I found two groups attempting to write CNNs in JS, so I'm fairly sure there's a package for everything :D :D
Our current idea:
Regarding the ranking see also my comment at https://github.com/MusicConnectionMachine/RelationshipsG3/issues/5#issuecomment-284272220 This would be a cool thing to try out, but it seems to me that ideally the time to approach this would be when we see that we really need this refinement and we need to get to this place first.
@simonzachau Spawning child processes and assign them jobs with other languages is always a bit of an overkill! Avoid that as much as possible, and really just do that if there is no other way.
Maybe we can use one of these: https://github.com/graphology/graphology-hits, https://www.npmjs.com/package/ngraph.hits
https://www.npmjs.com/package/graphology-hits was last published 2 weeks ago. What that tells me is that there is someone trying to do something similar and hasn't found a solution either, and that the package is being maintained (as opposed to the year old one).
@sacdallago thank you for reviewing our findings! That's why we also opted to try graphology-hits rather than the unmaintained ngraph.hits.
What are trustworthy links? Why do we trust some links more than other ones? Who says that musicbrainz.org is more trusted than mymusicblog.wordpress.com? Do we say that? (For thousands of links?)