Closed hexylena closed 8 years ago
+1
Where does the puling the checksum from the end of the URL and validating it happen in the Galaxy tool stack? Or does that need to be added to the Tool Shed installation framework?
The given download_url can have an extension like #md5# or #sha256#. This indicates a checksum which will be chekced after download. If the checksum does not match an exception is thrown.
Obviously as we write the standard, we'd make a hard requirement on sha256. Possibly would add to the linting process.
Clearly from a security point of view, sha256 is preferred, but it might be best to allow MD5 for cases where the upstream tool is released with just a list of MD5 checksums?
IMO: If upstream releases with md5, someone should:
It shouldn't just be preferred, it should be a hard requirement. I'll publish a md5 collision generator to the TS someday soon :P
:+1:
Unfortunately, for normal packages I don't think all download tags from the tool_dependency.xml
files support checksums.
True, I imagine things like setup_perl_environment
's downloads don't, we'll have to add support for that.
This is a very important point and I'm really glad you have brought it forward. I have a small remark about the implementation. Adding it to the URL is not in concordance with the HTML specification: http://www.w3.org/TR/WD-html40-970708/htmlweb.html#h-4.1.1
Of course, it won't do harm either, but I think it will make more sense to add the sums to the XML files, like:
<action type="download_by_url" md5sum="hexhexhex">https://u.r.l/clean-get-url-without-a-hashtag<action>
This would theoretically also allow two sums:
<action type="download_by_url" md5sum="hexhexhex" sha256sum="hexhexhex">https://u.r.l/clean-get-url-without-a-hashtag<action>
In this case the md5sum becomes part of the action (downloading and confirming its validity), rather than part of the url (with the purpose to point to a location within a HTML document).
In addition to this problem I think it would be great to also check all urls available in the toolshed and find those that do not use https, check whether they are available via https, and replace them if possible.
This was added to the best practises as #17 - see http://galaxy-iuc-standards.readthedocs.org/en/latest/best_practices/package_xml.html
However, the wiki https://wiki.galaxyproject.org/ToolDependenciesTagSets does not describe how to specify an expected checksum, and should probably link to https://wiki.galaxyproject.org/ToolDependenciesTagSets (I tried editing the wiki but the table layout was a bit too tricky).
See also verifying these as part of planemo lint commands, e.g. PR https://github.com/galaxyproject/planemo/pull/266
Also this needs to be expanded to list which other download actions support this:
_The checksums take the form of sha256sums attached as attributes to <action type="download_by_url">
and other elements, ..._
Also the best practices should spell out the expectation that the Galaxy packager check any dependency author provided MD5 and then compute and use a SHA256 for the package XML (which is how I understand the discussion above).
Hi @peterjc, sorry I've been delinquent in finishing galaxyproject/planemo#266.
Also this needs to be expanded to list which other download actions support this
right now it's essentially just download_by_url
I believe, (download_file?).
I've updated the wiki to reflect this.
Thanks. I suppose automating populating the SHA256 for existing packages managed by the devteam and IUC wouldn't add much security, so we do need to manually check things like the upstream provided MD5.
Hopefully https://github.com/peterjc/galaxy_blast/commit/175df67aaa3987b8faafd3b76a49dbaa070ac93a is more or less what you have in mind?
@peterjc sure, but...
That said, I'm on board with conda, which means sha256sums are no longer as interesting to me. That really looks like the future, happily negating much of this work.
Some more history @peterjc peterjc/galaxy_blast@175df67 was essentially what I had in mind until...
download_by_proxy
action which would let galaxy handle the cargo port interactions + sha256sum verification
download_by_proxy
, and I stopped pushing for cargo port URLs on PRs to tools-iuc and associated hashes.OK, so I won't rush to update my existing packages but may take the time to include the sha256sum on any new or updated tool_dependencies.xml
files I work on.
Assuming for now that just the download_by_url
action supports checksums, is there anything else we need to do for this issue now that you've updated the wiki?
I'd say no, not until closer to GCC when we have more conda stuff done, and we can completely rewrite docs to reflect the new work.
Feel free to re-open if you have more stuff.
Was doing some late night reading and stumbled across this rant, because I enjoy reading articles by people with wildly different opinions from me so I'm not living in the "Docker docker docker" echo-chamber.
I agree with the core point of the author, that we should be authenticating (to some extent) the binaries we download from the internet. Reading the code, we can add
#sha256#hash....
to all of our download urls, and that'll require checksums on them.This is increasingly an important issue given certain government activities in the US, and abroad.