Closed Yueqiao12Zhang closed 5 months ago
The entities that are not reconciled well:
In events and sessions CSV, there are three columns about the address: Country, Area, and Town. Country and Area can be reconciled well, but as the address gets more specific, there are many duplicates in the Town column since there are many towns with the same name in a single country. In this case, I think it's not easy to make the reconciliation procedure automatic. There must be human inspection on the reconciliation data.
- I'm wondering how can I get the The Session link to the artists.
You mean something like this? https://thesession.org/recordings/artists/319
- I'm wondering how can I get the The Session link to the artists.
You mean something like this? https://thesession.org/recordings/artists/319
Yes, but they only have the name for the artist in the CSV.
For sessions and events, since there are many towns/areas with the same name in different countries, we should inspect the reconciliation data carefully. My procedures maximizes automation, but there should still be some data that needs inspection. In these CSVs, their names, address, and venue in events have very low reconcile rate in wikidata. Should I still reconcile them?
No need to reconcile the location if there're any ambiguities. Do make a note in the documentation of the importing process for each database.
Generate the RDF for Virtuoso is the last step for this issue.
Whenever OpenRefine cannot automatically reconcile an item (i.e., cannot assign an URI), you can leave the item as a string literal, in which case, do add an language tag (e.g., @en) or data (e.g., a number or a date), in which case, do add the data type (e.g., http://www.w3.org/2001/XMLSchema#date).
Different approaches to process the session raw CSVs:
Comparison: The joining process in option 2 takes a lot of effort. Van's code expands all the CSVs horizontally, making one row for one tune, and merges all the CSVs by tune_id. Although there much less number of rows, this makes the header extremely long. For option 2, it's almost impossible to reconcile using OpenRefine. Since there is no reconciliation in The Session and there are thousands of rows, we have to reconcile one by one. In option 1, we reconcile the raw CSVs directly. Since the data is still vertical, there are small number of rows, which is easy to reconcile. Then we rename each id to {entity_type}_id, and we go to csv2rdf directly. It can merge all CSV into RDF in one operation. I think this is much more convenient.