evanplaice / node-ftpsync

Intelligent file syncronization over FTP
http://evanplaice.github.io/node-ftpsync
MIT License
74 stars 21 forks source link

Timestamp comparison logic #21

Closed ismay closed 7 years ago

ismay commented 9 years ago

I saw this in the readme:

The core functionality to push files to a remote server is now fully implemented. Until the timestamp comparison logic is worked out, updates (ie overwriting files) are determined solely by comparing file sizes

And was wondering if this feature is still under active development, or if there have been any problems you've encountered. I'm asking because the package works very well, except for the updating of files with different, but similarly sized content. Which is very laborious to check for manually.

evanplaice commented 9 years ago

I've thought about this a lot and I don't think timestamp is a consistent and/or reliable enough mechanism to rely on for comparison.

  1. The time on the server may not be accurate
  2. When a new file is uploaded to the server, the timestamp on the local files would need to be updated to reflect that both the local/remote files are in sync. This creates additional and unnecessary work on the client.
  3. In a one-to-many scenario where multiple clients are pushing to a single server, when one client makes an update it will invalidate the timestamps on every other client instance.
  4. The time-sync itself is hacky. Basically, it would involve uploading a temporary file, checking the timestamp on the upload file using a MDTM request (which isn't supported on all FTP servers), and calculating the difference after normalizing timezone differences.

Note: The difficulties are inherent to the FTP protocol in general.

If we were doing the sync over SSH, it would be possible to calculate the MD5 for both the local and remote to compare. Unfortunately, granting full SSH access opens a whole other can of worms in terms of security.

The only reasonable solution I can think of is to generate a digest listing that maps files to their calculated MD5 hash. Such an approach would be non-trivial to implement.


The gist is, syncing files should be safe 99.99% of the time using size comparison alone. Unless you manage to make changes to a file in a way that doesn't change the size of the file, this shouldn't be an issue.

ismay commented 9 years ago

Ok yeah, that makes sense. Problem I'm having though, is that for my website I'm appending a hash to assets, to invalidate the cache (so for example `styles-139shw.css). This hash doesn't change in length, so any html referencing an asset that has changed won't change in file size, while the hash it's referencing did change. Which means that a change won't be detected by ftpsync.

I'm solving this manually by just removing all .html before an upload, but it feels like there should be a better, less manual solution. Maybe even something as simple as a syncAll boolean, which would update even files of a similar filesize?

evanplaice commented 9 years ago

Oh, damn... That's probably a common use case too.

As a temporary stopgap you could look into using: https://github.com/angular/angular/issues/5123#issuecomment-154596927

grunt-ftp-push always pushes all files. It's not ideal in terms of bandwidth/time but it accomplishes what you described.

I haven't looked at the codebase for this project in a while but it shouldn't be too difficult to add a syncAll feature.

I'd like to try adding a file manifest at some point in the near future. It'll require a pretty major refactor of the code though.

ismay commented 9 years ago

Oh, damn... That's probably a common use case too.

Yeah, it probably is. I'm not sure what you meant with the stopgap, I'm not using angular, don't understand how I'd apply that to my use case.

Also, I'm using metalsmith and regular node scripts, so I'd like to stay with that, instead of introducing another buildtool (with grunt-ftpush). But there are probably also a lot of regular node libraries that'll do what grunt ftpush does. Thanks for the suggestion anyway!

And I was thinking, maybe instead of a syncAll it would be better to have a force option, which accepts an array of globs. That way you can pass it something like: force: ['**.html'], which would then overwrite all html regardless of filesize. That would be a way to fix the bug, without resorting to something drastic like pushing all files all the time.

evanplaice commented 9 years ago

Sorry, pasted the wrong link. Too much multi-tasking. The utility I meant was grunt-ftp-push: https://www.npmjs.com/package/grunt-ftp-push

I understand the desire to get away from using grunt. I'm gradually replacing grunt with npm scripts in my projects. Unfortunately, I'm not aware of a ftp-push module that runs standalone.

That's an interesting idea. I'm not sure how to implement it. Maybe first to the match based on differences, then pass the file listing through a second filter that applies force. The code really could use a refactor to make it act in a more functional manner.

ismay commented 9 years ago

Sorry, pasted the wrong link. Too much multi-tasking. The utility I meant was grunt-ftp-push: https://www.npmjs.com/package/grunt-ftp-push

:) Yeah, that's what I suspected.

Maybe first to the match based on differences, then pass the file listing through a second filter that applies force.

I don't know what kind of mechanic would be optimal for ftp, but that definitely sounds sensible.

evanplaice commented 9 years ago

FTP is just the last step in the processing chain. Believe it or not, that's the easy part.

The complicated part filtering through the local vs remote file listing to generate a complete collection of files that need to be acted upon.

If you look at the source, the processing chain is pretty straight-forward:

Eschon commented 8 years ago

Are there any updates on this?

I just updated my build process to add a hash to assets and now I'm facing the same problem.

evanplaice commented 7 years ago

PR #39 has landed with the functionality to do timestamp comparisons. Give it a test drive (preferably on non-production code first) and post your results.