MTG / dunya

The Dunya music browser
http://dunya.compmusic.upf.edu
GNU Affero General Public License v3.0
29 stars 20 forks source link

Key/Value metadata editor #204

Closed alastair closed 8 years ago

alastair commented 9 years ago

Simliar to the recent artist biography editor we need a new generic data editor for key/value data

The process should work something like this:

  1. A user uploads a json document. This document contains a list of a single type of items. Each item has a list of keys and values
  2. The json document is imported into the database. For now we can assume all keys and values are strings.
  3. A form editor lets a user correct a value if it is incorrect.
  4. If a value has been corrected, the next time a json document is uploaded, it never overwrites this value
  5. A value can have the never-overwrite setting disabled (with a check box, maybe)
  6. A new json document can be uploaded at any time, which may add new items, new keys, or update values in existing keys

This system can be developed as a new module as part of dunya. It should only be accessible by staff members.

alastair commented 9 years ago

Some sample json:

raaga

{
 "name": "Aarabhi", 
 "samvaadi": "m", 
 "jaati": "Audav - Sampoorna", 
 "thaat": "Bilawal", 
 "aaroha": "S - R - m - P - D - S'", 
 "prahar": "6 (9pm - 12am)", 
 "avroha": "S' - N - D - P - m - G - R -  S", 
 "vaadi": "S", 
 "pakad": "'D - S - R - m - G - R - S - 'N - 'D - S", 
 "description": "This Raga can be simply explained as Raga Durga with all seven notes in Avaroha (discending) or Raga Jhinjhoti with Shudhha Nishad (instead of the Komal Nishad). It has a tint of Tilak Kamod in Poorvang and has a sweet nature, pleasant mood.It is Poorvang-Pradhan Raga and hence mostly sung in the lower & middle octaves. Besides notes S and m (vadi & samvadi notes), one can rest on D, P and litlle bit on G. N is relatively weak note in this Raga.Originally a Raga from Carnatak music system, Arabhi has resemblance with a folk melody from Rajasthan called as 'Asa-Maand'. There was a popular song in Stage music of Maharashtra (Marathi Natya Sangeet), 'Chandrika hi Janu' which is based on Raga Arabhi. Some Hindustani musicians such as Pt. Jasraj call this Raga as 'Asa-Maand', but many musicians like Ust. Abdul Halim Jaffer Khan has preferred to present it with it's original Carnatak name 'Arabhi'."
}

artist

{
 "name": "A T Kanan", 
 "gharana": "Khayal - Kirana", 
 "image": "https://www.swarganga.org/images/artists/146.jpg", 
 "guru": "Girija Shankar Chakrabarty", 
 "instrument": "Vocal", 
 "born": "18/06/1929", 
 "bio": "Pandit A T Kanan was born in Madras on June 18, 1929.He was interested in music as well as in cricket in youth and he joined Railway service. In his 20's, while visiting Mumbai for a cricket match, he visited All India Radio, gave the audition and also got a chance to perform on AIR!He learnt music under Pt. Lahanu Babu Rao (Hyderabad) and Pt. Girija Shankar Chakraborty (Kolkata). He also had a great impact of Ust Amir Khan Sahib of Indore. He gave his debut performance in All Bengal Music Conference (1943).Kanan performed in all important music conferences, AIR National programs. He also worked as playback singer for many films, such as Meghe Dhaka Tara, Basant Bahar, Jadu Bhatta and Megh Malhar, etc.In 1950s, along with other musicians, he founded 'Calcutta Music Circle\u00c3\u00a2'. With his wife and famous vocalist Malabika Kanan, he joined ITC-SRA as a Guru from its establishment and also contributed a lot in its evolution. This couple has contributed in shaping many talented singers in Kolkata.Kanan was awarded ITC Award (1994) and Sangeet Natak Academy award (1995).Pandit A T Kanan died at the AMRI Apollo Hospitals, Kolkata on September 12, 2004.", 
 "more information at :": "", 
 "died": "12/09/2004"
}
andrebola commented 9 years ago

I have two questions:

I don't understand how the json indicates to what type of item corresponds. (maybe is indicated manually by the user?)

At the step 2. I'm not sure if the new elements are stored in a new table (only for this purpose) or if are stored at the corresponding item's table, for this example, if you're updating the artist "A T Kanan" would be the Artists table. In that case, if the key is not an existent attribute of Artist, should be added in some structure?

alastair commented 9 years ago

The type of item should be indicated by the user.

We should store all things in new tables. The reason for this is that some information that we are collecting in these json files doesn't have a place to be stored in the existing database structure (for example, much of the data in the raaga file). When this data has been imported and corrected we will move it into the main database with another script.

alastair commented 9 years ago

@gopalkoduri What will the primary identifier be for each item? How will we know if the json is the same the second time it is uploaded? I'm not sure using the name is a good idea, since especially artist names could be different...

gopalkoduri commented 9 years ago

Yes, true. For those artists/raagas/taalas that are already in Dunya, I will use their MBID/UUID by performing a fuzzy match. For newer ones, either we create a MBID (on MusicBrainz ofcourse) or UUID (like we are already doing for raagas now).

alastair commented 9 years ago

For others can you identify them using the id of the site that you're scraping from? As long as it's consistent, it doesn't matter what the id is.

gopalkoduri commented 9 years ago

Sure! I'll do that for now.

alastair commented 9 years ago

@gopalkoduri If a field is removed from the json, should it be removed from this editor? If the removed field has been edited should it still be removed?

gopalkoduri commented 9 years ago

The fields will be common for all entities in a given entity type, say Carnatic raagas. All raagas have the field irrespective of whether it is applicable to it or not. So a field cannot be removed from a single json - it can only be removed from the entity type. If it is removed from the entity type:

  1. Case where there are no previously edited entities for that field: It should be removed from the editor for all the entities.
  2. Case where there are some edited entities for that field: We should show in the editor the past values (edited or not) but without the ability to edit them further.

This leads to another question though: Do we want to different between the fields whether they are incomplete or inapplicable for a given entity? I think we should. For example, a raaga which is a mela, can never be janya_to another raaga, and viceversa. So, for some raagas, mela is an appropriate field, and for some janya_to is. I can use Null for inappropriate fields and empty string for appropriate but incomplete fields, please tell me if that is ok.

alastair commented 9 years ago

We don't need every key to be present in every item in a list

alastair commented 9 years ago

for @andrebola, the next things we need are:

field deletion

Delete entities, as above. For now it will be easier if we do not make them read-only

item uploading

This is automatic uploading of Items to Dunya's docserver module when they are confirmed in the editor

  1. Add a model field Item.verified
  2. Link the model Category to docserver.SourceFileType. This can be optionally set at upload time.
  3. The Item edit page has another select box called "Verified". If this is checked and we have a SourceFileType set, perform the upload
  4. Upload code is currently at https://github.com/MTG/dunya/blob/master/docserver/views.py#L87, but only through a django-rest-framework method. I am currently working on refactoring this. Use docserver.util.docserver_add_sourcefile and I will finish the implementation.
  5. If any Field is changed (and we set Field.modified), upload the entire Item again to the docserver
  6. If a new file is uploaded and items are changed then Item.verified is unset and waits for a person to re-verify it.

If @gopalkoduri thinks it is useful, we could add another field to mark items in 6 so that people know they have been changed and need to be re-verified.

@gopalkoduri for your information, I chose to add the verified flag straight away because it is not much more work to add while we are doing the upload part.

gopalkoduri commented 9 years ago

If @gopalkoduri thinks it is useful, we could add another field to mark items in 6 so that people know they have been changed and need to be re-verified.

Yes, please do.

alastair commented 8 years ago

We need to make one more change to allow you to specify which collection this data is a part of (so for example you can separate carnatic artists from hindustani artists).

We need to clarify one more item about how you want to download this data once it has been corrected. The easy way is to have a link on this editor page which lets you download a file, e.g. "all carnatic artists", however this is not useful in the long-term if we want to automatically update the data from the app. We need to talk about the API format that this might follow.

alastair commented 8 years ago

We should restrict all items to having an id which is a uuid

There should be an option to add a collection to each category. This is used to create a new document and join it to the collection. We upload a SourceFile of this document with the specified SourceFileType.

alastair commented 8 years ago

Fixed in https://github.com/MTG/dunya/commit/19e5956efd71af5229f3527c99bfb5c9bcfc0968