DOREMUS-ANR / knowledge-base

Repository containing controlled vocabularies and data published by DOREMUS
http://data.doremus.org/
Apache License 2.0
16 stars 7 forks source link

Links from Isni to MusicBrainz #50

Closed kgtodorov closed 6 years ago

kgtodorov commented 6 years ago

@TheMarmottin 4319 links to MusicBrainz that should be added on DOREMUS artists (identified by Jérôme Roy through the VIAF identifiers, restricting to doremus ID with no MB link):

https://loujine.github.io/musicbrainz-dataviz/doremus.html

source page - jupyter notebook with the process (html version)

pasqLisena commented 6 years ago

It would be worth to convert that table (does a csv version exists?) in rdf, in order to be uploaded in the triplestore.

TheMarmottin commented 6 years ago

Well there is a CSV version now. I'm going to work on converting it in rdf, it should'nt take long.

TheMarmottin commented 6 years ago

There it is. I only wrote the ids, not the names, let me know if they are needed. I'll get working on modifying the actual artists files.

pasqLisena commented 6 years ago

Great :) Names are not required, they are already in the triplestore.

I'll get working on modifying the actual artists' files.

What do you mean?

TheMarmottin commented 6 years ago

I meant editing the 49 artists_x.ttl files, to add the sameAs property to their corresponding MusicBrainz id (for the ~4k artists that are concerned)

TheMarmottin commented 6 years ago

Oh by the way, I noticed that quite a lot of these artists have their VIAF id defined twice, such as this one :

doremus_artist:00272365-4985-3346-a47c-6f7be65190b5 owl:sameAs viaf:46477062;
      owl:sameAs <http://viaf.org/viaf/46477062>.

Is that intended ? If not, I can erase one of the two, since i'm going to parse all the artists anyway, might as well do it in one run.

rtroncy commented 6 years ago

Don't do this @TheMarmottin. This is unnecessary. This is better to maintain original files close to the source data and a sameAs file

TheMarmottin commented 6 years ago

Very well then :)

pasqLisena commented 6 years ago

so is this task complete? can we close the issue?

rtroncy commented 6 years ago

Almost. @pasqLisena Can you remind us how the artists_x files in the folder are generated? And why some descriptions contain two equal sameAs links to the same VIAF URI?

pasqLisena commented 6 years ago

Can you remind us how the artists_x files in the folder are generated?

They have been generated by a script that interrogates isni. Anyway they will be no more necessary (because of the new algorithm).

And why some descriptions contain two equal sameAs links to the same VIAF URI?

Eventually, the viaf id can be expressed in 2 different part of the isni record. If they are coherent, I would not consider this a problem.

rtroncy commented 6 years ago

Indeed, this folder will not be loaded in the triple store since the new strategy will directly look into isni to generate doremus artist URI. The sameAs file created by @TheMarmottin is useful and should be loaded to add more links to music brainz. @pasqLisena Should you clean up the repo and delete those isni files?