MusicConnectionMachine / api

In this project, the API to interface the Postgres database is exposed.
GNU Affero General Public License v3.0
1 stars 4 forks source link

Events #22

Closed chaoran-chen closed 7 years ago

chaoran-chen commented 7 years ago

For the timeline component, that we have recently created, we would like to be able to fetch event data of an artist.

GET /artists/:artist_id/events
Output:

200 --> Arrary.<Event>

where Event is defined as following:

/**
 * @typedef {Object} Event
 * @property {string} date - the date of an event
 * @property {string} title - the title should be short enough to be displayed in the small box inside of the timeline
 * [@property {string} [description] - a longer description that will be displayed in a tooltip] - not available for the moment -
 * [@property {string} [icon] - the URL to an icon] - can not be provided by the API -
 * @property {string} [link] - the URL to a page that provides more information
 */

If other attributes such as category can be provided, we can certainly also use it. Some ideas for events are listed here.

chaoran-chen commented 7 years ago

@MusicConnectionMachine/group-3, @MusicConnectionMachine/group-4 Please assign this issue to one of you and tell us until when you can provide the API. Thanks!

Sandr0x00 commented 7 years ago

Since this involves @MusicConnectionMachine/group-1 and @MusicConnectionMachine/group-2 too, I added them to this conversation. If I recall correctly, @MusicConnectionMachine/group-3 and @MusicConnectionMachine/group-4 will only provide the data for title and start, link (that's where we get our data from, initially given by G1/G2). We can't provide anything for: description: we have only relationship-triplets, no description, icon: we work on text, not pictures, linkType: Please explain that in more detail @chaoran-chen.

chaoran-chen commented 7 years ago

There are two possible values for linkType:

Would it make sense to use the complete sentence or paragraph that is the source of the relationship as the description?

ansjin commented 7 years ago

As far as for the timeline component we can only provide the data in the form of triplets only:

E1: Subject
E2: Date for the event
Relationship: Relationship/ A small description which is mentioned as part of the sentence.

If you want in the other way like:

Entity: E1
Date: Some Date
Description: ...

Date: Some other Date
Description: ...

And continuing  like this....

Then its a complete different approach from extracting relationships from the text. I can't guarantee about whether we could provide data like this or not. For us, this will be like extracting events from the text not the relationships 🤐

And regarding the other things:

Sandr0x00 commented 7 years ago

As @ansjin said for his Group, G4 also has no information about IMSLP. We can only give you the URL from G1/2. To clarify (since I wrote it a bit too shallow): We also do not know start or end, we just know "event | happened at | time"

chaoran-chen commented 7 years ago

Thanks for your answers, I've changed the initial post accordingly.

Can you provide any information about the type of an event? E.g., personal (birth, death, marriage, illness), "career" (composed something, ...), other people speaking about him/her, ... ?

Sandr0x00 commented 7 years ago

Sorry, can't answer that yet since the extraction of dates and times just came up last week and we (G4) just started with it. This Issue was set to low prio in our project (https://github.com/MusicConnectionMachine/RelationshipsG4/issues/47) I will inform you, when we continue working on that.

ansjin commented 7 years ago

@chaoran-chen I was able to extract the attached data for the timeline. I think the data looks good for creation of timeline and it is in the order events happened in entity's life.

You will find the attached input and output JSON files for Bach and Mozart.

It's in this format :


{
        "start": "1763",
        "end": "1766",
        "event": "In the years 1763 - 1766, Mozart, along with his father Leopold, a composer and musician, and sister Nannerl, also a musically talented child, toured London, Paris, and other parts of Europe, giving many successful concerts and performing before royalty."
    },
    {
        "start": "November 1766",
        "event": "The Mozart family returned to Salzburg in November 1766."
    },

Note:

Just check this data and see if we can finalize and prepare API for it.

bach_output.txt mozart.txt mozart_output.txt bach.txt

chaoran-chen commented 7 years ago

Thanks, @ansjin, it looks great! I've three further questions:

ansjin commented 7 years ago

Thanks!

chaoran-chen commented 7 years ago

Okay, that's fine! Thanks for your effort, I'm looking forward to see the API online :)

chaoran-chen commented 7 years ago

Hey. Might I ask if you've already started with this task? When can we expect to have it ready?

chaoran-chen commented 7 years ago

Please give us an estimate when it can be implemented. If you can't finish it in the next few days, maybe you could provide a real and roughly complete dataset for Mozart (or Bach) so that we can get a feeling how many events there will be and how long the texts are.

ansjin commented 7 years ago

@chaoran-chen and @vviro

Currently we take WET file given to us by group2(Later on get from the DB) -> Pass to algorithms -> Get the output Events/Relationship - > Store in the DB (currently Local DB)

But the problem is either we don't have the data to run our algorithms upon or the data is too bad that our algorithms don't give the meaningful results out. See here https://github.com/MusicConnectionMachine/Relationships/issues/27

If the @MusicConnectionMachine/group-2 can provide us some meaning full data( like the scrapped Wikipedia page) then only we will give you the results.

chaoran-chen commented 7 years ago

As I have seen, you already received a set of scrapped of Wikipedia pages. Are they suitable to extract event data?

And one more thing: please provide a unique id to every event.

Sandr0x00 commented 7 years ago

The data has a unique id from the db. So we can ✔️ on that. We ran our algorithm over the data and got some data: currently the data is just on the local dbs and is saved in the db like that: id: b85fbef0-2cf5-4a43-a437-65e6058ee2ce start: 1100 end: event: Monophonic chant, also called plainsong or Gregorian chant, was the dominant form until about 1100.[36] Polyphonic (multi-voiced) music developed from monophonic chant throughout the late Middle Ages and into the Renaissance, including the more complex voicings of motets.

I can give you a CSV-file from my local db, with data extracted from the wiki-file, but until now we have no data in the db on Azure. Don't know the status of the API (to give you the real request-response stuff) to be honest. Maybe @Henni has some overview here. Edit: The api seems to be live on Azure: https://github.com/MusicConnectionMachine/api/issues/88#issuecomment-292565213

So, at the moment do not expect too much of our date-extraction, because still, this Date-Extraction is only a side-job from our relationship-stuff, therefore still low priority, like @kordianbruck added here (https://github.com/MusicConnectionMachine/RelationshipsG4/issues/47#event-1018353841)

On the other side, the code for the extraction is written.

We only have no way of linking an event to a certain person at the moment because 1: we have no data from @MusicConnectionMachine/group-1 in our db, and 2: we have no link from the data of G1 to the blob of G2, and 3: we don't know exactly for which person/musicpiece/instrument/animal/thing the event is, we only can assume it's for the current entity we process, and that can lead to: start: 27 January 1756 event: Wolfgang Amadeus Mozart was born on 27 January 1756 to Leopold Mozart to be linked to Beethoven, if Beethoven is the entity we are currently processing, because we can't link Wolfgang Amadeus Mozart to the WAM already in the DB from G1, because we simply do not know it is the same person.

Maybe @MusicConnectionMachine/group-1 has some "events" from the structured sources.

kordianbruck commented 7 years ago

@Sandr00 update on this?

sacdallago commented 7 years ago

Super needed

Sandr0x00 commented 7 years ago

We still have no connection between G1s entities and our entities, because it was low prio before yesterday. Since this is now HP, we will start with that. I do not have much time at the moment, but maybe another one of @MusicConnectionMachine/group-3 or @MusicConnectionMachine/group-4 can do that. Otherwise I can still do it in the Hackathon, but maybe this will be too late.

sacdallago commented 7 years ago

Hackathon is too late, unfortunately, because if things go wrong the time won't suffice to fix them and still run everything, considering it's a saturday and we are supposed to release on sunday. So please someone else can try this out before that, aka tonight, tomorrow or Thursday?

Sandr0x00 commented 7 years ago

@ansjin said in the chat, that he has a bit more time from 20.4. Maybe he's your man 😉

ansjin commented 7 years ago

As @Sandr00 mentioned, yes I will be free from 20th so I will look into this after that!

chaoran-chen commented 7 years ago

What's the current status here? I've just taken a look into the events table: there is still no entityId.

ansjin commented 7 years ago

Please check the DB, there is already an entityId associated with events.

This issue can be closed now!

chaoran-chen commented 7 years ago

Thank you very much, @ansjin! I found entityId but it's only in mcmprod and not in mcm. Do you know if the API is already using mcmprod? (or maybe @sacdallago, @kordianbruck?)

ansjin commented 7 years ago

@chaoran-chen Your welcome, I am not sure about the API. Also currently there are around 25K relations and 125K events already stored in the DB, maybe you can try to use this data and give us feedback on it!

Henni commented 7 years ago

@chaoran-chen I just switched the API to mcmprod. But there are still a few schema validation errors, thanks to how swagger handles null values.