LibraryOfCongress / bagger-js

Upload BagIt-format deliveries to S3 entirely in the browser
https://libraryofcongress.github.io/bagger-js/
Other
32 stars 7 forks source link

Refactor worker usage to use a pool #4

Closed acdha closed 9 years ago

acdha commented 9 years ago

Previously we had one worker. In @eikeon's React branch, we have one per file. We should refactor this to have a limit so we'll spawn up to a certain number but wait for a worker to finish before starting another one.

It would probably be good to measure startup performance to see whether we should attempt to reuse workers across files. For small files the JIT startup performance might be enough to make that worth the modest additional effort.

eikeon commented 9 years ago

It's actually still just creating one per bag ( https://github.com/loc-rdc/bagger-js/blob/introduce-react/assets/jsx/contents.jsx#L45 ). It's only hashing the first file at the moment since it's only being send one message.

acdha commented 9 years ago

Ah, yes, when I was looking at the diff I thought that was in FileRow. It's good that it's already at the BagContents level since that's probably where it'll need to go to manage the pool sanely.

johnscancella commented 9 years ago

Would this also improve the UI performance? Right now I am dropping in 39 mb and 2169 files/folders and the UI becomes unresponsive (can't scroll, can't click any of the buttons, etc). After a while (I guess when the workers are done) then everything works again as expected.

acdha commented 9 years ago

@johnscancella Possibly – that's definitely why all of the real work needs to be done in Web Workers. Simply updating the DOM isn't a problem until you get hundreds of thousands of elements but we'll need to be really careful about when are read into memory. If the refactoring which @eikeon did to pass in the File directly avoids that, it might be a major win

acdha commented 9 years ago

Interesting, with ~22K files there's one unexpected hotspot: getting the file sizes!

screenshot 2015-03-30 09 39 28

We should keep an eye on that and if necessary refactor that work into the worker since that's the only spot where we must open the file handle.

There's also one regression recently – we need to set table-layout: fixed on the main table to avoid recalculating column widths each time we update the rows – but I was waiting on that until after making the hash display truncate nicely with a tool-tip so you can still see the full value for those long SHA-256 hashes.

eikeon commented 9 years ago

@johnscancella and @acdha I'm assuming these tests are agains the main branch. It'd be nice to compare them to the react branch. Getting the file size has already been refactored into the worker in the react branch.