Closed asfimport closed 12 years ago
Chris M. Hostetter (@hossman) (migrated from JIRA)
#2
i realize that the maintainers of maven repos say "all files are immutable" and that this potential risk of malicious or accidental file changes exists for all maven users – but that's the choise of all maven users to accept that as a way of life. I'm raising this issue only to point out a discrepancy between the "confidence" we use to be able to give people who download src releases, vs what we have currently with ivy.
Chris M. Hostetter (@hossman) (migrated from JIRA)
Here's a strawman suggestion for adding a SHA1 check to our custom LicenseCheckTask, it was inspired by some of the code in ant's <checksum/> task, but it's not as configurable (only supports one SHA1 and only the simplest file format)
I put this code in LicenseCheckTask because:
If people want to pursue this, We can always refactor – either to generalize the name/description of of "LicenseCheckTask" or to refactor this out into it's own custom task (and do an extra jar crawl).
(NOTE: current patch doesn't actually include the checksums themselves, so as is this will fail the build. If we trust what "ant resolve" is pulling down right now, the files would be trivial for anyone to generate, but i would suggest being a little more diligent and and generating the SHA1s against what was committed in svn up until last week – except obviously where we've actually changed which jar/version we use because of the ivy work)
Comments?
Robert Muir (@rmuir) (migrated from JIRA)
+1, we really need to do this. it can also detect ivy cache corrumption.
its really unrelated to actually how we get the jars, thats an impl detail. we should be checking that its the jar we tested against, or fail hard.
Robert Muir (@rmuir) (migrated from JIRA)
If people want to pursue this, We can always refactor – either to generalize the name/description of of "LicenseCheckTask" or to refactor this out into it's own custom task (and do an extra jar crawl).
I don't think we should do an extra crawl, we can just name it DependencyCheckTask. Dependencies need to have licensing information and sha checksums, and 'ant validate' fails if they don't.
Dawid Weiss (@dweiss) (migrated from JIRA)
reader = new BufferedReader(new FileReader(f));
Isn't this locale-sensitive? I think it should be explicit UTF-8 (or US-ASCII for that matter).
+ String hexStr = Integer.toHexString(CHECKSUM_BYTE_MASK & digest[i]);
+ if (hexStr.length() < 2) {
+ checksum.append("0");
+ }
+ checksum.append(hexStr);
Isn't any of these simpler?
checksum.append(String.format(Locale.ENGLISH, "%02x", CHECKSUM_BYTE_MASK & digest[i]));
or
char [] HEX = "0123456789abcdef".toCharArray();
int v = digest[i];
checksum.append(HEX[(v >>> 4) & 0x0F]).append(HEX[v & 0x0F]);
Dawid Weiss (@dweiss) (migrated from JIRA)
Btw. you can also avoid a recrawl by passing a refid of the same fileset to two tasks rather than constructing a new one in each. I don't mind renaming the class either.
Steven Rowe (@sarowe) (migrated from JIRA)
it can also detect ivy cache corrumption.
I have encountered cache corrumption: AFAICT a truncated jar file (tika-parsers-1.0.jar); killing the cache fixed the problem. We could add a clean-ivy-cache
target calling the target Ivy provides.
Chris M. Hostetter (@hossman) (migrated from JIRA)
Isn't any of these simpler?
FWIW that code was basically verbatim from ant's Checksum.java, but yeah: your way is better
Isn't this locale-sensitive? I think it should be explicit UTF-8
Also verbatim, but it actually caught my eye and I thought it was intentional to deal with line ending differences in diff OSes, but your comment makes me realize thta makes no sense – and BufferedReader chomps all possible endings equally regardless of locale/encoding.
we can just name it DependencyCheckTask.
I think in the long run we should do that, and rename the taskdef, and rename the macrodef, ... but for 3.6 it might be better to keep the changes to a minimum and just add the logic to the existing class and add the SHA1 files but not rename anything.
...updated patch with those changes ... going to start compiling the list of all SHA1s files on trunk
Chris M. Hostetter (@hossman) (migrated from JIRA)
bah ... previous patches were broken in that they wouldn't actaully fail the build when checksum failures were detected.
Robert Muir (@rmuir) (migrated from JIRA)
Haven't tested but looks good: I also think its a good idea to keep the old name for now, we can rename later.
unrelated but couldnt help but notice, this file has no license header. This is because the current rat tasks ignore tools/....grrrrr :)
Chris M. Hostetter (@hossman) (migrated from JIRA)
previous patch with the addition of SHA1 files for all jars currently used on trunk.
SHA files were generated using...
ant clean resolve
find -name \*.jar | xargs -L 1 sha1sum | awk '{ system("echo " $1 ">> " $2 ".SHA1") }'
find -name \*.SHA1 | xargs svn add
all tests pass these jars, but one thing i notice is that the classpath logic in most places seems to be based on exclusion instead of inclusion (ie: 'excludes=".txt,.template"' instead of 'includes="*.jar"') so with these SHA1 files added, we now get a metric shit ton of "ZipException: error in opening zip file" warnints logged by junit because it's trying to parse these files as jars
should we fix all these classpaths, or rename the files "*.SHA1.txt" ?
(note: i haven't even looked at the packaging yet to verify if the SHA1 files are being included in the src builds and excluded from the bin builds)
Robert Muir (@rmuir) (migrated from JIRA)
well ideally i think it would just be .sha1 (lowercase), this matches the extension used for signing our release artifacts...
The reason for exclusion versus inclusion is in the case the jars don't actually exist yet, its important!
but in lucene/common-build.xml i think we should just set up a single libexcludes ant property or something:
**/*.txt,**/*.template,**/*.sha1
and just use that for these filesets
fileset dir="foo" excludes="${libexcludes}"
or whatever. and yeah libexcludes is a crappy name, its just the idea.
Robert Muir (@rmuir) (migrated from JIRA)
and yeah, we need to test and likely modify the binary dist patterns as you hinted at.
Chris M. Hostetter (@hossman) (migrated from JIRA)
revised patch, switches the file names to sha1, and adds a "common.classpath.excludes" property thta i started using everywhere i could find that made sense.
I've verified that this keeps those sha1 files out of the solr war, and so far it looks good for all the tests ... would be helpful if someone with a faster computer could sanity check that (Note: test won't fail if the sha1s are in the classpath – you'll just get a ZipException in the console that you have to grep for)
Chris M. Hostetter (@hossman) (migrated from JIRA)
1) fixes a few places ha1 files were still getting used in solr contrib test classpaths 2) fixes the binary releases to exlcude the sha1 files 3) verified that the src releases should work fine (they do an svn export so once the files are commited we should be good) 4) package-local-src-tgz currently includes the sha1 files (but it also currently includes the jars if you don't do clean-jar first ... not really something i'm going to worry about here)
I think we're good to go?
Robert Muir (@rmuir) (migrated from JIRA)
+1
I opened #5025 and we can later look at an other similar minor improvements related to that on other issues (like compile-tools, called before validate, likely tries to use the ant.jar)...
some of that we might just have to fix in a later release, at least it wont silently work wrong (worst case you get a zip error).
Chris M. Hostetter (@hossman) (migrated from JIRA)
Committed revision 1309503. - trunk
rmuir said on irc that he'd work on backporting to 3x for me (going to grab some lunch soon and then get on a plane)
Conversation with rmuir last night got me thinking about the fact that one thing we lose by using ivy is confidence that every user of a release is compiling against (and likely using at run time) the same dependencies as every other user.
Up to 3.5, users of src and binary releases could be confident that the jars included in the release were the same jars the lucene devs vetted and tested against when voting on the release candidate, but with ivy there is now the possibility that after the source release is published, the owner of a domain where these dependencies are hosted might change the jars in some way w/o anyone knowing. Likewise: we as developers could commit an ivy.xml file pointing to a specific URL which we then use for and test for months, and just prior to a release, the contents of the remote URL could change such that a JAR included in the binary artifacts might not match the ones we've vetted and tested leading up to that RC.
So i propose that we include checksum files in svn and in our source releases that can be used by users to verify that the jars they get from ivy match the jars we tested against.
Migrated from LUCENE-3945 by Chris M. Hostetter (@hossman), resolved Apr 04 2012 Attachments: LUCENE-3945_trunk_jar_sha1.patch (versions: 3), LUCENE-3945.patch (versions: 3)