dcmi / dctap

DC Tabular Application Profile
https://dcmi.github.io/dctap/
34 stars 10 forks source link

Making files downloadable #96

Open kcoyle opened 1 year ago

kcoyle commented 1 year ago

I would like to make the template files downloadable. After experimenting, this seems to be the best way (although it may vary by browser and I haven't tested that). This necessarily uses HTML but AFAIK we can embed HTML in the markdown files that are used to create the DC web site. Grab this and try it out:

<html>
<body>

<ul>
<li>a <a href="https://raw.githubusercontent.com/dcmi/dctap/main/TAPtemplate.csv" download>comma separated value template</a></li>
<li>an <a href="https://raw.githubusercontent.com/dcmi/dctap/main/TAPtemplate.xlsx" download>MS Excel file</a></li>
<li>an <a href="https://raw.githubusercontent.com/dcmi/dctap/main/TAPtemplate.ods" download>OpenOffice file</a></li>
</ul>

</body>
</html>
lagbolt commented 1 year ago

But these very posts are markdown, so you can include the "a" element right here, like so:

MS Excel file

And it works! I also tested it in a github repo and it works there as well.

Note that I included only the 'a' element and not the entire html.

kcoyle commented 1 year ago

I forgot to say that I am referring to the links in the Primer, which is on the DC site. The DC site is in HTML, although it is fed with Markdown documents. I don't know what the transformation looks like because that happens somewhere I don't see, so I don't know how "raw" Markdown links get interpreted. I am hoping that we can force all of the files as downloads and not send people to github, and also not have to move everything to the DC site. Downloads work for the excel and open office files with or without the download, but I can't get it to work for the csv files (at least not in Firefox). In the latter the csv opens in a browser tab. That could be just a browser option.

I also would like to have an "open in Google Sheets" option but that would require some javascript, and the web manager is reluctant to have anything that has to be maintained specially.

And third, I'm working on a mock-up of how we could have download links for each of the tables in the Primer - a table with the primer values filled in, and that could be used as a basis to create a TAP or experiment with it. I'll try to post that tomorrow. That's a bit more work, but not a huge amount. I assume we'd want those files to reside in github rather than manage them in the DC site.

kcoyle commented 1 year ago
I would also like to provide downloadable tables for each of the displayed tables in the Primer. These would have all 12 columns, with the cells in the Primer example filled in. Here's a table for the example covering valueNodeType: shapeID shapeLabel propertyID propertyLabel mandatory repeatable valueNodeType valueDataType valueConstraint valueConstraintType valueShape note
dct:title Title literal xsd:string
dct:creator Author IRI
dct:date Publication date literal xsd:date
dct:extent Pages literal xsd:decimal
sdo:isbn ISBN literal xsd:string

For each table in the primer there would be three files for download: CSV, XLSX, ODS. This would make the primer a source of starter tables. We could also create similar downloadable tables for any of the examples that we have. To make them easily downloadable, though, we need an HTML file on the DC web site that would point to the files in github. The goal is to make the examples easier to find and easier to use for folks who are not comfortable with github.

nishad commented 1 year ago

download attribute for anchor elements works only for same-origin [1] URLs. So it is an expected behaviour.

[1] https://developer.mozilla.org/en-US/docs/Web/Security/Same-origin_policy

lagbolt commented 1 year ago

It seems you need to know what happens to HTML embedded in Markdown when DC converts the Markdown to HTML. I showed above that it works fine for GitHub, but that doesn't prove anything about DC.

I'd bet it works just fine, but you seem skeptical. Is there any way you could run an experiment, or just ask?

kcoyle commented 1 year ago

@nishad Thanks. That explains why I got different results in different environments. For the moment I will skip the idea of a true "download" but am still interested in getting file links to TAPs that folks can grab from the Primer, and eventually setting up a web document that links to other examples. This expands on the ideas that John and I had, and rather than using something like Google Sheets this would be directly connected to the DC web site documents pulling in the raw github files.

kcoyle commented 1 year ago

Here's what I wanted to see:

http://kcoyle.net/temp/primertest/

But I can live without it ;-)

nishad commented 1 year ago

I may not be the most articulate in explaining this, but the example provided illustrates a multitude of issues.

  1. Browsers render URLs based on the mime type^1 from the HTTP response header, not the extension. If the mime type is one that the browser cannot render, it presents the content as a downloadable file stream instead of displaying it. Modern browsers attempt to determine the content type if it isn't explicitly set, choosing either to display or download it. Browsers vary in their rendering capabilities; for instance, iOS Safari may render Excel files, but Safari on Mac may force a download instead.

  2. A file will be downloaded if the server sets the content-disposition header^2 as an attachment, regardless of the mime type. An anchor element can also force a download using the download attribute if the URL originates from the same domain^3.

  3. raw.githubusercontent.com neither sets mime types for files nor functions like a conventional HTTP server, given its unique purpose. Therefore, I always recommended using GitHub page URLs, (for this repo it is - https://dcmi.github.io/dctap), to serve unstable documents or content. I haven't had the chance to read through all discussions here, but I believe you have specific reasons for opting to use the raw.githubusercontent.com domain.

The outcomes (both success and filure) of tests you mentioned above are influenced by a combination of these factors.

As I detailed in one of the DCTAP calls: For publishing "UNSTABLE" documents or files, use the dcmi.github.io domain. Meanwhile, when it comes to "STABLE" documents or non-large files, we typically host them on the DCMI website, provided they are compatible with the existing website infrastructure and workflow.

kcoyle commented 1 year ago

Thanks again, @nishad. Here's some background - right now there are links to our three "template" files in the TAP elements file on the DC site. I wanted to copy those into the Primer file on github, eventually to update the Primer on the DC site. The first thing I discovered is that I can't type "xlsx" right, so the link on the DC site document for that gets a 404. I'll fix that in github. The link for the Open Office version takes you to an unhelpful github page, as will the corrected Excel file link. This is because I should have used the "raw" URL. (The github.io link also goes to an unhelpful github page.)

Using "raw" - the Excel and Open Office files spawn a download without over-writing the document in the open tab. That is a helpful result. However, at least in Firefox the CSV replaces the "calling" document, displays the CSV, and doesn't download anything. I think this might be confusing for some users, in particular those we are aiming for with DCTAP. It requires them to save the contents of the Firefox tab then return to the original document.

All I know about "Content disposition" is what I have just quickly scanned from the link you provided. ;-) Unless you explain otherwise I will assume that isn't something we can easily add to the documents on the DC site. This does, however, look like it would implement something like my desired outcome:

Content-Type: text/html; charset=utf-8
Content-Disposition: attachment; filename="cool.html"
Content-Length: 21

<HTML>Save me!</HTML>

I want to add links throughout the Primer (if no one objects) so that readers can download individual examples that are there. That will mean a lot more of these small files, but we can just make a directory in github for them.

I'd be 90% happy if we could also get the CSV to work as a download, and am open to ideas. I would be 100% happy if we could have an "Open in Google Sheets" link, but I fear that would go beyond what we do on the DC site. I suppose one option for the CSV would be to bundle up all of the CSV template and example files into a ZIP file, but I really would like readers to be able to zero in easily on the specific table they are interested in.

For now I'm going to go with using "raw" which at least gets a good result for 2 out of the 3 file types. All other ideas very much welcome, but we mustn't complicate things uncomfortably. Good enough is good enough.

nishad commented 1 year ago

@kcoyle I apologize for not fully grasping the entire issue, and I recognize that I might not convey it effectively.

To address this, I've taken the following steps:

  1. Assumed that all template files are designated as STABLE.
  2. Merged all STABLE files onto the DCMI website.
  3. Updated the template links on both the DCTAP specification^1 and elements^2 pages.
nishad commented 1 year ago

@kcoyle,

Please refrain from using raw.githubusercontent.com for DCMI public webpages, unless there's a compelling reason. Beyond the mime-type issues, I'm able to set up cross-origin permissions for dcmi.github.io. However, extending these permissions to raw.githubusercontent.com introduces a security vulnerability. While raw.githubusercontent.com can potentially serve files from any malicious repository, dcmi.github.io strictly serves files from repositories within the DCMI organization, ensuring a higher level of security.

For internal linking and discussions, such as referencing a markdown file, feel free to use raw.githubusercontent.com. Remember, dcmi.github.io/dctap will render markdown files as HTML.

I've confirmed the proper functioning of the dcmi.github.io/dctap links:

It's worth noting that many browsers recognize https://dcmi.github.io links based on their mime-type. As a result, most non-text/html files will prompt a download.

kcoyle commented 1 year ago

@nishad Thank you for all of your work on this. I will use github.io for the links.

I'm unsure what I can do with the "stable" template files. Can those be used with my proposed HTML link with download?

Here's my proposal:

A future task:

p.s. The Primer document has other edits that would be included in the web site update

nishad commented 1 year ago

@kcoyle When we refer to files as "Stable," it means they're no longer directly editable. You won't be able to modify them on GitHub as you've been doing. To update these files, you'll need to coordinate with the web manager to replace them.

I believe points 1 and 2 have already been addressed, as I highlighted in https://github.com/dcmi/dctap/issues/96#issuecomment-1676782175. Please review those links; the files are set up for download. If this isn't what you had in mind, then I might have misunderstood the core issue.

Regarding point 3: You can handle this directly on GitHub. Once the document reaches its "Stable" status, we can transfer it to the main website. I'll also relocate the linked files, as demonstrated in https://github.com/dcmi/dctap/issues/96#issuecomment-1676782175.

Please make any necessary edits and provide an updated version if you wish to make changes to already published documents.