galaxy-iuc / standards

Documentation for standards and best practices from the Galaxy IUC
http://galaxy-iuc-standards.readthedocs.io/en/latest/
6 stars 16 forks source link

Require checksums on, ALL downloads #14

Closed hexylena closed 8 years ago

hexylena commented 9 years ago

Was doing some late night reading and stumbled across this rant, because I enjoy reading articles by people with wildly different opinions from me so I'm not living in the "Docker docker docker" echo-chamber.

I agree with the core point of the author, that we should be authenticating (to some extent) the binaries we download from the internet. Reading the code, we can add #sha256#hash.... to all of our download urls, and that'll require checksums on them.

This is increasingly an important issue given certain government activities in the US, and abroad.

natefoo commented 9 years ago

+1

peterjc commented 9 years ago

Where does the puling the checksum from the end of the URL and validating it happen in the Galaxy tool stack? Or does that need to be added to the Tool Shed installation framework?

hexylena commented 9 years ago

@peterjc https://github.com/galaxyproject/galaxy/blob/01f2ac19cafc3b41a718e6510a42b8158cb59d98/lib/tool_shed/galaxy_install/tool_dependencies/recipe/step_handler.py#L153

hexylena commented 9 years ago

The given download_url can have an extension like #md5# or #sha256#. This indicates a checksum which will be chekced after download. If the checksum does not match an exception is thrown.

hexylena commented 9 years ago

Obviously as we write the standard, we'd make a hard requirement on sha256. Possibly would add to the linting process.

peterjc commented 9 years ago

Clearly from a security point of view, sha256 is preferred, but it might be best to allow MD5 for cases where the upstream tool is released with just a list of MD5 checksums?

hexylena commented 9 years ago

IMO: If upstream releases with md5, someone should:

hexylena commented 9 years ago

It shouldn't just be preferred, it should be a hard requirement. I'll publish a md5 collision generator to the TS someday soon :P

bgruening commented 9 years ago

:+1:
Unfortunately, for normal packages I don't think all download tags from the tool_dependency.xml files support checksums.

hexylena commented 9 years ago

True, I imagine things like setup_perl_environment's downloads don't, we'll have to add support for that.

yhoogstrate commented 9 years ago

This is a very important point and I'm really glad you have brought it forward. I have a small remark about the implementation. Adding it to the URL is not in concordance with the HTML specification: http://www.w3.org/TR/WD-html40-970708/htmlweb.html#h-4.1.1

Of course, it won't do harm either, but I think it will make more sense to add the sums to the XML files, like:

<action type="download_by_url" md5sum="hexhexhex">https://u.r.l/clean-get-url-without-a-hashtag<action>

This would theoretically also allow two sums:

<action type="download_by_url" md5sum="hexhexhex" sha256sum="hexhexhex">https://u.r.l/clean-get-url-without-a-hashtag<action>

In this case the md5sum becomes part of the action (downloading and confirming its validity), rather than part of the url (with the purpose to point to a location within a HTML document).

In addition to this problem I think it would be great to also check all urls available in the toolshed and find those that do not use https, check whether they are available via https, and replace them if possible.

peterjc commented 8 years ago

This was added to the best practises as #17 - see http://galaxy-iuc-standards.readthedocs.org/en/latest/best_practices/package_xml.html

However, the wiki https://wiki.galaxyproject.org/ToolDependenciesTagSets does not describe how to specify an expected checksum, and should probably link to https://wiki.galaxyproject.org/ToolDependenciesTagSets (I tried editing the wiki but the table layout was a bit too tricky).

See also verifying these as part of planemo lint commands, e.g. PR https://github.com/galaxyproject/planemo/pull/266

peterjc commented 8 years ago

Also this needs to be expanded to list which other download actions support this:

_The checksums take the form of sha256sums attached as attributes to <action type="download_by_url"> and other elements, ..._

Also the best practices should spell out the expectation that the Galaxy packager check any dependency author provided MD5 and then compute and use a SHA256 for the package XML (which is how I understand the discussion above).

hexylena commented 8 years ago

Hi @peterjc, sorry I've been delinquent in finishing galaxyproject/planemo#266.

Also this needs to be expanded to list which other download actions support this

right now it's essentially just download_by_url I believe, (download_file?).

I've updated the wiki to reflect this.

peterjc commented 8 years ago

Thanks. I suppose automating populating the SHA256 for existing packages managed by the devteam and IUC wouldn't add much security, so we do need to manually check things like the upstream provided MD5.

Hopefully https://github.com/peterjc/galaxy_blast/commit/175df67aaa3987b8faafd3b76a49dbaa070ac93a is more or less what you have in mind?

hexylena commented 8 years ago

@peterjc sure, but...

That said, I'm on board with conda, which means sha256sums are no longer as interesting to me. That really looks like the future, happily negating much of this work.

hexylena commented 8 years ago

Some more history @peterjc peterjc/galaxy_blast@175df67 was essentially what I had in mind until...

peterjc commented 8 years ago

OK, so I won't rush to update my existing packages but may take the time to include the sha256sum on any new or updated tool_dependencies.xml files I work on.

Assuming for now that just the download_by_url action supports checksums, is there anything else we need to do for this issue now that you've updated the wiki?

hexylena commented 8 years ago

I'd say no, not until closer to GCC when we have more conda stuff done, and we can completely rewrite docs to reflect the new work.

hexylena commented 8 years ago

Feel free to re-open if you have more stuff.