Closed chaoran-chen closed 7 years ago
@MusicConnectionMachine/group-3, @MusicConnectionMachine/group-4 Please assign this issue to one of you and tell us until when you can provide the API. Thanks!
Since this involves @MusicConnectionMachine/group-1 and @MusicConnectionMachine/group-2 too, I added them to this conversation.
If I recall correctly, @MusicConnectionMachine/group-3 and @MusicConnectionMachine/group-4 will only provide the data for title
and start
, link
(that's where we get our data from, initially given by G1/G2).
We can't provide anything for:
description
: we have only relationship-triplets, no description,
icon
: we work on text, not pictures,
linkType
: Please explain that in more detail @chaoran-chen.
There are two possible values for linkType
:
"internal"
: The link points to a site of IMSLP, it can be used for events such as "First piece published", then the event should contain a link to the IMSLP page of that piece."external"
: The link points to another external site that contains more information about the event.Would it make sense to use the complete sentence or paragraph that is the source of the relationship as the description?
As far as for the timeline component we can only provide the data in the form of triplets only:
E1: Subject
E2: Date for the event
Relationship: Relationship/ A small description which is mentioned as part of the sentence.
If you want in the other way like:
Entity: E1
Date: Some Date
Description: ...
Date: Some other Date
Description: ...
And continuing like this....
Then its a complete different approach from extracting relationships from the text. I can't guarantee about whether we could provide data like this or not. For us, this will be like extracting events from the text not the relationships 🤐
And regarding the other things:
start: We can not tell you whether its a start date or end date. For example Entity 1 was born on 5th March1786. Entity 1 died in1830 Here, it will always be a single date not the intervals, so it would be better to have a date not the start as parameter.
title, description: For us title and description will be the same thing(relationship). Its up to you how you want to use and display them.
icon- how can we provide this? Its totally unrelated.
link - The link of the URL from where we extract relationships can be referenced by us in our relationship table. So you can get the link from there.
[linkType='internal'] - We can't provide this information. We will reference to the URL table from there you can ask group1 or group2 to store this as an extra parameter.
As @ansjin said for his Group, G4 also has no information about IMSLP. We can only give you the URL from G1/2. To clarify (since I wrote it a bit too shallow): We also do not know start or end, we just know "event | happened at | time"
Thanks for your answers, I've changed the initial post accordingly.
Can you provide any information about the type
of an event? E.g., personal (birth, death, marriage, illness), "career" (composed something, ...), other people speaking about him/her, ... ?
Sorry, can't answer that yet since the extraction of dates and times just came up last week and we (G4) just started with it. This Issue was set to low prio in our project (https://github.com/MusicConnectionMachine/RelationshipsG4/issues/47) I will inform you, when we continue working on that.
@chaoran-chen I was able to extract the attached data for the timeline. I think the data looks good for creation of timeline and it is in the order events happened in entity's life.
You will find the attached input and output JSON files for Bach and Mozart.
It's in this format :
{
"start": "1763",
"end": "1766",
"event": "In the years 1763 - 1766, Mozart, along with his father Leopold, a composer and musician, and sister Nannerl, also a musically talented child, toured London, Paris, and other parts of Europe, giving many successful concerts and performing before royalty."
},
{
"start": "November 1766",
"event": "The Mozart family returned to Salzburg in November 1766."
},
Note:
For some objects the end date might not exist as there are no intervals mentioned.
Also my current logic goes to two level dates only, if there are more than 2 dates in a single sentence it won't detect.
Just check this data and see if we can finalize and prepare API for it.
Thanks, @ansjin, it looks great! I've three further questions:
{day: undefined, month: 1, year: 1762}
instead of "January, 1762"
)? (If it is too much of work and you don't have time right now, I can also do it in the front-end.)type
of an event as I have asked three posts ago?Thanks!
I have to check this, may be I will add it at the later point if that's not a problem with you ?
Source of the link will be added when we will create API. As for extraction of this relationship I will be using some URL or some text linked to URL so from there it can be easily referenced.
I think regarding event type it will take time. As I have to parse text and extract the meaning out from it, so I can't assure you on this.
Okay, that's fine! Thanks for your effort, I'm looking forward to see the API online :)
Hey. Might I ask if you've already started with this task? When can we expect to have it ready?
Please give us an estimate when it can be implemented. If you can't finish it in the next few days, maybe you could provide a real and roughly complete dataset for Mozart (or Bach) so that we can get a feeling how many events there will be and how long the texts are.
@chaoran-chen and @vviro
Currently we take WET file given to us by group2(Later on get from the DB) -> Pass to algorithms -> Get the output Events/Relationship - > Store in the DB (currently Local DB)
But the problem is either we don't have the data to run our algorithms upon or the data is too bad that our algorithms don't give the meaningful results out. See here https://github.com/MusicConnectionMachine/Relationships/issues/27
If the @MusicConnectionMachine/group-2 can provide us some meaning full data( like the scrapped Wikipedia page) then only we will give you the results.
As I have seen, you already received a set of scrapped of Wikipedia pages. Are they suitable to extract event data?
And one more thing: please provide a unique id
to every event.
The data has a unique id
from the db. So we can ✔️ on that.
We ran our algorithm over the data and got some data:
currently the data is just on the local dbs and is saved in the db like that:
id: b85fbef0-2cf5-4a43-a437-65e6058ee2ce
start: 1100
end:
event: Monophonic chant, also called plainsong or Gregorian chant, was the dominant form until about 1100.[36] Polyphonic (multi-voiced) music developed from monophonic chant throughout the late Middle Ages and into the Renaissance, including the more complex voicings of motets.
I can give you a CSV-file from my local db, with data extracted from the wiki-file, but until now we have no data in the db on Azure. Don't know the status of the API (to give you the real request-response stuff) to be honest. Maybe @Henni has some overview here. Edit: The api seems to be live on Azure: https://github.com/MusicConnectionMachine/api/issues/88#issuecomment-292565213
So, at the moment do not expect too much of our date-extraction, because still, this Date-Extraction is only a side-job from our relationship-stuff, therefore still low priority, like @kordianbruck added here (https://github.com/MusicConnectionMachine/RelationshipsG4/issues/47#event-1018353841)
On the other side, the code for the extraction is written.
We only have no way of linking an event to a certain person at the moment because 1: we have no data from @MusicConnectionMachine/group-1 in our db, and 2: we have no link from the data of G1 to the blob of G2, and 3: we don't know exactly for which person/musicpiece/instrument/animal/thing the event is, we only can assume it's for the current entity we process, and that can lead to: start: 27 January 1756 event: Wolfgang Amadeus Mozart was born on 27 January 1756 to Leopold Mozart to be linked to Beethoven, if Beethoven is the entity we are currently processing, because we can't link Wolfgang Amadeus Mozart to the WAM already in the DB from G1, because we simply do not know it is the same person.
Maybe @MusicConnectionMachine/group-1 has some "events" from the structured sources.
@Sandr00 update on this?
Super needed
We still have no connection between G1s entities and our entities, because it was low prio before yesterday. Since this is now HP, we will start with that. I do not have much time at the moment, but maybe another one of @MusicConnectionMachine/group-3 or @MusicConnectionMachine/group-4 can do that. Otherwise I can still do it in the Hackathon, but maybe this will be too late.
Hackathon is too late, unfortunately, because if things go wrong the time won't suffice to fix them and still run everything, considering it's a saturday and we are supposed to release on sunday. So please someone else can try this out before that, aka tonight, tomorrow or Thursday?
@ansjin said in the chat, that he has a bit more time from 20.4. Maybe he's your man 😉
As @Sandr00 mentioned, yes I will be free from 20th so I will look into this after that!
What's the current status here? I've just taken a look into the events
table: there is still no entityId
.
Please check the DB, there is already an entityId associated with events.
This issue can be closed now!
Thank you very much, @ansjin! I found entityId but it's only in mcmprod and not in mcm. Do you know if the API is already using mcmprod? (or maybe @sacdallago, @kordianbruck?)
@chaoran-chen Your welcome, I am not sure about the API. Also currently there are around 25K relations and 125K events already stored in the DB, maybe you can try to use this data and give us feedback on it!
@chaoran-chen I just switched the API to mcmprod
. But there are still a few schema validation errors, thanks to how swagger handles null
values.
For the timeline component, that we have recently created, we would like to be able to fetch event data of an artist.
where Event is defined as following:
If other attributes such as category can be provided, we can certainly also use it. Some ideas for events are listed here.