elifesciences / schematron-wiki

This contains the markdown from gitbook for schematron.
MIT License
2 stars 0 forks source link

Create Glencoe page #157

Closed fred-atherden closed 3 years ago

fred-atherden commented 3 years ago

Definition of done

fred-atherden commented 3 years ago

@Melissa37, @naushinthomson, @bcollins14, @JGilbert-eLife, please review this page - https://elifesciences.gitbook.io/productionhowto/-M1eY9ikxECYR-0OcnGt/pages-in-progress/glencoe.

Melissa37 commented 3 years ago

Videos are uploaded to the Glencoe FTP once during the production process at pre-editing by Exeter. If there are any changes to required to the videos during proofing, then this is done at post-author validation.

I think the first "to" should be removed:

Videos are uploaded to the Glencoe FTP once during the production process at pre-editing by Exeter. If there are any changes required to the videos during proofing, then this is done at post-author validation.

Melissa37 commented 3 years ago

Exeter supply videos and metadata to the Glencoe FTP. This is an automated process, except in cases where the videos for an article are cumulatively large enough in file size, that they need to be manually uploaded.

Do we know what this size is?

Melissa37 commented 3 years ago

Typically we don't receive notifications from Glencoe when video processing has failed (although in some cases we may). This is simply because the metadata or videos themselves have been provided in such a way that they cannot be processed to begin with. If Exeter are unsure of the reason why videos have failed processing, they will contact the eLife production team.

I am not sure I really understand - is the point that we rarely get failure messages generated by Glencoe, but we do get a fair few failures. Exeter alert us to these because the Glencoe system does not even pick them up as they happen so early?

Melissa37 commented 3 years ago

If there is a folder in the zip, this will be shown in the output, for example:

Can you explain a bit more about what the image is showing. Is this wrong or right? It's a bit confusing as the bit underneath seems to be referring to something else, the XML file. Is the idea that the user looks at the XML file, but that's not part of the image above?

Melissa37 commented 3 years ago

Is there a schematron that can be run on the XML file to check the issues you mention?

Melissa37 commented 3 years ago

The videos can then be interacted with using Glencoe's API. For example https://movie-usa.glencoesoftware.com/metadata/{doi} (where {doi} is replaced with an actual doi) will return a JSON response containing information relating to all the videos uploaded for that article. Similarly individual videos can be found using the following convention http://movie-usa.glencoesoftware.com/video/{doi}/{video-id} (where {video-id} is the value of the id attribute for the media element in the XML for the corresponding video).

Can you provide some real examples, for instance the DOI could be the full https string or just the number and people may not know what the video-iid is

fred-atherden commented 3 years ago

Exeter supply videos and metadata to the Glencoe FTP. This is an automated process, except in cases where the videos for an article are cumulatively large enough in file size, that they need to be manually uploaded.

Do we know what this size is?

No, I can ask Exeter to confirm when we send it over.

fred-atherden commented 3 years ago

Typically we don't receive notifications from Glencoe when video processing has failed (although in some cases we may). This is simply because the metadata or videos themselves have been provided in such a way that they cannot be processed to begin with. If Exeter are unsure of the reason why videos have failed processing, they will contact the eLife production team.

I am not sure I really understand - is the point that we rarely get failure messages generated by Glencoe, but we do get a fair few failures. Exeter alert us to these because the Glencoe system does not even pick them up as they happen so early?

Exeter provide them in such a way that the videos are not able to transcoded and I think Glencoe only provide transcoding failure messages. I have added further calrification.

fred-atherden commented 3 years ago

If there is a folder in the zip, this will be shown in the output, for example:

Can you explain a bit more about what the image is showing. Is this wrong or right? It's a bit confusing as the bit underneath seems to be referring to something else, the XML file. Is the idea that the user looks at the XML file, but that's not part of the image above?

I have tried to clarify. Let me know if this is still unclear.

naushinthomson commented 3 years ago

path-to-zipfile should be replaced with an actual path the the zip file (including the zip filename - for example unzip -l /Users/fredatherden/Desktop/elife_Nov_16.video.zip).

Should this be

path-to-zipfile should be replaced with an actual path to the zip file (including the zip filename - for example unzip -l /Users/fredatherden/Desktop/elife_Nov_16.video.zip). ?

naushinthomson commented 3 years ago

Providing publication dates at the start of the workflow resolves this race condition, but it does come with a caveat - if the number videos in an article are changed

Should be

Providing publication dates at the start of the workflow resolves this race condition, but it does come with a caveat - if the number of videos in an article are changed

naushinthomson commented 3 years ago

Troubleshooting video upload failures

What is the expected action if any of the checklist items fail? Can production address these or do they need to go back to Exeter?

fred-atherden commented 3 years ago

Is there a schematron that can be run on the XML file to check the issues you mention?

No schematron - there is an XQuery (included on that page) which does however. The XQuery is preferable since it pulls out all the data from the zips without the user having to unzip themselves. At some point I will update the BaseX page (and link to it from here), so that anyone can use this XQuery with ease.

I may be able to make it work in oXygen, if that's desirable, but it would take some more time.

JGilbert-eLife commented 3 years ago

Maybe change the description to "Videos in eLife articles are hosted by Glencoe software Inc."?

JGilbert-eLife commented 3 years ago

Would it be possible to add screenshots of an example file structure on the FTP site to "How are videos supplied"? Just to visualise what's being described.

JGilbert-eLife commented 3 years ago

I think this should be "Once the videos have been processed by Glencoe, Exeter then embeds the links in Kriya, so that the videos are displayed for proofing."

JGilbert-eLife commented 3 years ago

I may have misunderstood but is it possible to clarify what happens in "Why pub dates are included in the XML" if an article does not yet have a publication date? E.g. what is this pub-date if not the article pub-date? (I assume it's just today's date, but that's not stated outright)

JGilbert-eLife commented 3 years ago

This is really useful, thanks Fred!

fred-atherden commented 3 years ago

I may have misunderstood but is it possible to clarify what happens in "Why pub dates are included in the XML" if an article does not yet have a publication date? E.g. what is this pub-date if not the article pub-date? (I assume it's just today's date, but that's not stated outright)

I've tried to clarify the text - the publication date in the XML supplied to Glencoe is the date of upload, no longer anything to do with article publication date.

bcollins14 commented 3 years ago

Inside the folder two zips are placed (this isn't actually a requirement from Glencoe, it could just be one zip, this is simply how Exeter have implemented it).

Just out of curiosity, do we know why they did it this way?

fred-atherden commented 3 years ago

Inside the folder two zips are placed (this isn't actually a requirement from Glencoe, it could just be one zip, this is simply how Exeter have implemented it).

Just out of curiosity, do we know why they did it this way?

Beats me :shrug:.

bcollins14 commented 3 years ago

I cannot remember what article this relates to but we received a processing email that despite all the URLs being in the success section, the ends (I think) were plain text so this was an error. Would this be worth mentioning or was it so rare that not really?

bcollins14 commented 3 years ago

Otherwise, I have nothing further to add. Thanks Fred, I certainly learnt some things here 👍

fred-atherden commented 3 years ago

I cannot remember what article this relates to but we received a processing email that despite all the URLs being in the success section, the ends (I think) were plain text so this was an error. Would this be worth mentioning or was it so rare that not really?

Hmm, I don't remember this or what the cause of the problem was. If we can find the article, then I can look into why it caused issues and add it as an example.

fred-atherden commented 3 years ago

Inside the folder two zips are placed (this isn't actually a requirement from Glencoe, it could just be one zip, this is simply how Exeter have implemented it).

Just out of curiosity, do we know why they did it this way?

I just worked out why it was done this way. It's legacy reasons relating to the old workflow, the XML was re-uploaded with the actual publication date of the article (the first deposit had no pub date). Doing it with separate zips meant that the videos did not have to be re-uploaded to the ftp. I will update the page with this info.