Closed clapierre closed 4 years ago
@clapierre is it valid for a file to not have a license?
No we will have to have a license for each file, but if no license is provided in the JSON file we should assume the most basic license maybe just GNU? Ultimately we will need to get the licenses from the 3rd party source of the files. Just wondering if maybe we don't include the license which would flag us to find the real license prior to making a resource file public.
Which gets me wondering if we need to have a boolean at either the Resource or Resource File level on if this resource/file is available publicly or not.
@clapierre I've noticed that in the Master file, language has as value a shortcode like "en". In the original MVP doc the taxonomy for language was a list of full names: "Braille", "German", etc. Should I be mapping these shortcodes or can these be explicit in the generated JSON?
No we will have to have a license for each file, but if no license is provided in the JSON file we should assume the most basic license maybe just GNU? Ultimately we will need to get the licenses from the 3rd party source of the files. Just wondering if maybe we don't include the license and assume GNU which would flag us to find the real license prior to making a resource file public.
I can default to a license, sure. By default a resource file is published as draft
anyway.
Right Sina pointed out that Braille isn't a language but a script and it would still be English, and if we go with the normal language codes then we can display them any way we want in the UI. SO this is why I thought either "en", "fr", "de", "es" etc.
Right Sina pointed out that Braille isn't a language but a script and it would still be English, and if we go with the normal language codes then we can display them any way we want in the UI. SO this is why I thought either "en", "fr", "de", "es" etc.
Ok, in that case it's probably better if I change the taxonomy on my end.
@jkva do you have a list of the most common licenses as a start? We will add other licenses which may be proprietary in the case where we point to a 3rd party library as will be the case with the "DCMP Membership" license that we will have for all those videos.
That still needs us to account for the "All languages" case, though.
That still needs us to account for the "All languages" case, though.
Yes
For the Language codes we will use ISO 639-1 Language Codes
@clapierre and "All languages" being "all"
, then?
For the Licenses this is what we started with but I know there must be a more complete list somewhere.
o CC BY 4.0 o CC:BY o CC:BY-NC o CC: BY-NC-ND o CC: BY-NC-SA o CC: BY-ND o CC: BY-SA o DCMP Membership o GNU-GPL o OER
Others we need to include? maybe CC BY 3.0, etc.
@clapierre and "All languages" being "all", then?
Sure I think that will be fine.
It would be nice to map Accessibility Accommodations to an array as well. Otherwise I'll map on the plugin side.
I think that makes sense Tags and Accessibility Accommodations are arrays in the JSON manifest. I assume you agree Sina.
yes to tags and accomodations being an array. RE licenses, Creative Commons (CC) seems like something worht while to track e.g. CC0 etc.
Just a heads up, I’m the Program Manager for Imageshare, not the Product Manager. Charles is the Product Manager and he can keep that job. Amaya <cleaned up :)>
Sounds good 😊.
Quick thing, @TheLadyMay, if you’re going to respond from email, which I tend to do as well, please erase everything in the email and then type your response, because all of it gets shoved into the web interface if you don’t.
Good point @sinabahram I have at times forgotten to do that and yeah definitely makes a messy GitHub issue thread.
I have just checked in an updated version of the excel/txt and json file including a new entry for a DCMP video with two languages.
In the process some invalid characters we will need to figure out special start and end quotes and en-dash seems to cause the the txt file to get rev. ?'s appearing and the JSON file will get \udc97, \udc93 and \udc94. not sure if there will be an easy way to go through the excel spreadsheets to convert all these and strip spaces etc.
This is a good catch, Charles, and quite important. Let’s please avoid any Unicode in the spreadsheet.
Those quotes aren’t actually quotes but high Unicode characters, so they should be replaced with regular quotes, same for all other things. It sounds like it was copied from some other source with all that going on. You may wish to first paste it into a regular text editor and then copy paste it back.
Currently the input files are being validated against https://github.com/benetech/Imageshare/blob/Development/wp-content/plugins/imageshare/assets/import.schema.json . That's pretty strict; but I'd prefer not to do any data marshalling plugin-side if I can avoid it.
Sorry, @jkva is there an ask here? e.g. is it not validating currently? Or, are you just letting us know?
@sinabahram Sorry - it's meant to be informative as I added it today.
Hi @jkva I see the following for subjects in the validator "subject": { "enum": ["Biology", "Chemistry", "Physics", "Environment", "Earth", "Astronomy", "Algebra 1", "Algebra 2", "Calculus", "Statistics", "Engineering", "Circuits", "Computer Programming"] },
So if there is a subject not in there whats the process for adding new subjects?
Same question for License and Accommodations.
I should add that the current resource file does not validate against the schema, not without me making some modifications here and there. To me that suggests that the existing taxonomies are not entirely fleshed out yet, or not properly synchronised.
@clapierre I've currently got them hardcoded as to illustrate how it would work, eventually I would like the schema to be partially dynamically generated out of the taxonomy scheme that generates the internal WordPress taxonomies.
That would then ensure that the existing WordPress GUI can be used to add subjects, licenses, accommodations, et cetera.
@clapierre I'll likely convert it into a twig
template as I'm already using twig, and it would become output of some ImportValidatorHelper
or some such.
@clapierre To clarify, it would be better to have the languages
be fully-formed ISO 639-1 names, "English", "French", etc, if possible - this means I don't have to do any mapping on the plugin side and will mean that new languages could be added via the WordPress admin interface.
If not, I can still make it work, but it's not ideal.
You prefer “English” over “en-us”?
I thought if we used the ISO 639-1 Language Codes which has "en", as generic "English" and then there is also the option to as Sina points out to have "en-us" or "en-uk" etc. but the current txt file does have the ISO 639-1 code "en" which I would think is what we want. right?
Is this issue ready to be closed out?
I think so, I believe you may have wanted to remove my script to do the conversion from UTF16 to 8 and have that done in your python script? We can leave this as is if you would rather @sinabahram
I think it's fine for now. That's a pretty small optimization if anything, so let's see if any problems arise with the current workflow.
I am currently creating a master CSV file that will be used to generate the master JSON Manifest #71 I will attach it to this ticket once complete.