MusicConnectionMachine / Relationships

GNU Affero General Public License v3.0
9 stars 1 forks source link

Classify / Filter data from algorithms #9

Closed Sandr0x00 closed 7 years ago

Sandr0x00 commented 7 years ago

The whole filtering and classifying after the algorithm call.

simonzachau commented 7 years ago

update #32: getting verbs and stemming them for better classification.

Next up is semantic similarity (e.g. teach and learn should be classified as the same type):

RBirkeland commented 7 years ago

See #40

Sandr0x00 commented 7 years ago

Semilar runs on Azure, which we have no longer access to. Therefore this issue is more or less on hold.

simonzachau commented 7 years ago

61 stores the default types and descriptions to which we match our requests with Semilar. These defaults are currently in our config file.

simonzachau commented 7 years ago

The basics should work!

Classifying relationships as predefined types works as described above. If we are not able to match the relationship to a predefined type (means Semilar says that the similarity ≤ 0.5), the database attribute will be null and relationships can thus be filtered. What there is still to do: Implement a "direction" of a relationship. Know if A was taught by B or if A taught B. Currently, they may be the same relationship and the inverse attribute is always null.

simonzachau commented 7 years ago

Regarding the "direction" of a relationship: Semilar can actually be smart enough to detect this.

Example:

semilar("A inspired by B", "A influences B") = 0.27 (wrong, and also below the threshold)
semilar("A inspired by B", "B influences A") = 0.64 (correct)

Counterexample:

semilar("A taught by B", "B learnt from A") = 1.0 (wrong)
semilar("A taught by B", "A learnt from B") = 1.0 (correct)

The question is: Would it be worth to make double the number of Semilar requests to be able to find out the direction for some cases? Are there other ways to find out the direction?

What do you think @FelsyWaschbaer @ansjin @vviro @kordianbruck @sacdallago ?

simonzachau commented 7 years ago

@MusicConnectionMachine/group-5 & @MusicConnectionMachine/group-6 do you need/want the feature of "directed" relationships or is it good enough to just know the type of the relation, e.g. inspired and the original relationship text, was inspired by in that case?

syncall commented 7 years ago

Not sure what you mean with direction. The data should be reasonable. There is a big difference between Leopold is the father of Amadeus Mozart and Amadeus is the father of Leopold (Leopold is the real father)

simonzachau commented 7 years ago

It depends on if you want to know who was the father of whom or if the father relation itself is the only thing that is important, because we still got the original relationship text (e.g. was the father of).

simonzachau commented 7 years ago

Spoken in use cases: if you want to show a graph of the family tree the computer needs to know who's the father of whom. But if you just want to get all family relationships, we are already able to give you the relevant entities with the original string and the knowledge that it is a father relationship. TL;DR: it depends what you want to do with the data.

syncall commented 7 years ago

Ok then this is definitely more for group 5. They do the widget to really explore the relations. But it's quite interesting what you are able to do with these relations! The most we though of is this http://webpageg6.8ed630ce.svc.dockerapp.io Right now the only thing we hope to do is to actually include you're data somehow.

martomi commented 7 years ago

Group 5 here, you can have a look at https://github.com/MusicConnectionMachine/VisualizationG5/issues/56 to get a feeling for the relations widget. Taking the above example, if we receive a relation that is defined as Mozart - father of - Leopold we would obviously show that in our UI and it would be wrong. We're currently assuming that the relation is directional from the first to the second entity. We should receive Mozart - son of - Leopold or Leopold - father of - Mozart, optimally we would get both, but that's probably unrealistic requirement for now. So I think it's quite important to have the direction right.

simonzachau commented 7 years ago

@syncall Ok, but just for the relationship rectangles that appear on the top after entering some search the knowledge about the direction is not required in my point of view. This way we have some more time for the API before the initial release. I share your "hope" - please keep in close contact with us (via the API group) and mention the exact features that you're missing from us.

simonzachau commented 7 years ago

@martomi sounds good!

father of is the relationship_type, is the father of is the relationship_string as it originally appeared in the source. By taking the relationship_string and term1 and term2 you automatically have the correct meaning as it was in the source.

In regards to receiving the same relationship in variations several times: Yes, this is actually realistic and is our current implementation! Mozart - is the son of - Leopold and Leopold - is the father of - Mozart are both going to have the father of relationship_type. If you want to get both, just filter for this type.

Our discussion shifts into a very API-specific direction now... but to conclude, In my point of view implementing a relationship direction is not necessary for now if you just display strings. Therefore, this issue can be closed.

gyachdav commented 7 years ago

(sorry for hijacking this discussion).

Hey @martomi is the current version of the widget hosted somewhere? I couldn't find a demo. If not, thats totally fine please don't bother with making a demo work now. Most important is to get the widget to work.

martomi commented 7 years ago

@gyachdav It's not hosted yet, but should be by end of the week! Will let you know as soon as it is!

kordianbruck commented 7 years ago

@martomi thanks! We are eager to see it in action :clapper: