browserify / browserify

browser-side require() the node.js way
http://browserify.org/
MIT License
14.61k stars 1.19k forks source link

Bundle hash changes depending on directory with --require #517

Open bnkr opened 11 years ago

bnkr commented 11 years ago

I am building a browserify bundle which exports some modules to be used from code in the browser. The bundle has a version-specific name derived from the hash of the bundle.

This doesn't work when using -r because the absolute name of the input file is hashed to make an export. Since the absolute name will include each developer's username in the path, it means that the build has to be done exactly the same by every user on every machine in order to produce a file which hashes the same.

If I apply replace(process.cwd(), '') to the parameter of hash on the following lines then the problem is worked around:

This is a bit hacky so I didn't make a pull request. I think that making the hashed paths relative to the basedir would be the best way. I'm not sure I understand enough how to do this but if you could give me some pointers or maybe test cases I could have a go at it.

This issue means I can't use already existing grunt modules (example) to build the bundles, however writing it in rake would make the whole thing much easier anyway.

Additionally #355 might make what I want to do impossible (although it appears to works fine for me).

The workaround I will use is to generate the version-specific name from a hash of the input files instead (from --deps) which is a bit less neat but would achieve more or less the same thing.

Therefore I would say this isn't particularly critical to solve, but if you did decide to make repeatable builds a feature then I think solving this would be necessary in addition to #355.

ghost commented 10 years ago

We could hash the source content instead of hashing the file names.

bnkr commented 10 years ago

That would fix my problem, though hypothetically you could end up with modules with the same hash.

I had a go at implementing this but there are several places where the filename is hashed and the source is often not available at that point. I tended to break the circular.js unit test one way or another. I had a really frustrating time understanding how the many mappings of filename and id relate to each other -- in particular that row.id is sometimes a filename and sometimes as hash -- so I think my javascript skills are probably not up to solving this problem.

connrs commented 10 years ago

I have experienced this a number of times when switching between developing on my laptop (Ubuntu,) my desktop (Mac) and when working with a colleague on a project.

@bnkr, your solution is quite effective. In applying your change, it doesn't cause any tests to fail. I think that I would definitely benefit from something like this making its way in to browserify

thlorenz commented 10 years ago

@substack I'd be a bit careful with that since at this point some modules may depend on the full-path being used as an id. For lack of a better example here is a test util that would break if the above PR would be merged.

connrs commented 10 years ago

Hi @thlorenz

This is why I only made a change to the hash function itself and not to the actual path at a higher level. I'm not sure that this change could have an impact on the fullPath variable mentioned in that test as it's the MD5+Base64 hash that is affected here. Unless that test is doing something magical with the hash that I haven't seen. Happy to be corrected.

Thanks, Paul.

thlorenz commented 10 years ago

@connrs I'm not entirely sure myself about what this PR would affect. I was under the impression that it changes the ids under which each module gets hashed in the bundle. If that is not the case please ignore.

bnkr commented 10 years ago

When I was messing with this, it seems that the hash function is used for several things so it wouldn't be correct to do the replacement there. For example in the function "hasher" where it is used to hash row.source which is the contents of the source file itself. Of course the easy answer is to use two different hash functions (hashForModuleId vs hashForContent or something) but I wonder if the whole thing wouldn't be a lot more clear with some more abstraction, e.g. an object for Module so that there's only one place to store and compute this data and Bundle so there's only one place to query and alter the list of Modules. (Though I hesitate to start prescribing Big Refactorings when I'm not all that familiar with node style JS...)

connrs commented 10 years ago

I think I agree with you @bnkr, it is a little odd to have the hash function perform a replace on CWD when the content may not contain it. I think I'll start again with a little test and a separate function to describe the process a little better.

connrs commented 10 years ago

I've rewritten the change (this time starting with some simple tests to make sure that it's all above board) with a new function (idHash) to describe it a little better.

In addition @thlorenz, I ran exposify's tests using my feature branch and tests passed. I think I'm feeling comfortable enough now that I'll progress this to a real pull request.