[1.2.0] Update the cfs-worker

raix commented 10 years ago

The file worker makes sure that we are not storing all uploaded files to all stores at once.

We do this by tracking the uploaded files and queue them up in tasks - one task pr. storage adapter.
We throttle the worker queue and failed storage attempts are retried later.
If a storage adapter wants to synchronize its data with the rest of the storage adapters on a file it simply store the file into Temp storage and the worker will get notified - This action does not involve conflict resolution.

Pull structure https://github.com/CollectionFS/Meteor-cfs-gridfs/issues/4#issuecomment-37139912 WIP https://github.com/CollectionFS/DEPRECATING-cfs-worker/blob/pull-arc/fileWorker.js

Tasks:

[ ] The temp store can loose the file - We have to accept that as a worker and cancel the task - #209
[ ] The file worker should observe or listen to upload events on the Tempstore #186
[ ] The transformWrite/transformRead should run async #156
[ ] Use power-queue to handle tasks #126
[ ] Maybe have an internal worker collection for tracking tasks - could persist a spinal queue for power-queue
[ ] Worker should be optional
[ ] Worker should be able to be installed on a separate server and use one or more separate server processes. It should actually probably be a Meteor app as opposed to a Meteor package. (Could potentially even be a nodejs app, but I don't think it matters much.)
[ ] Worker should have queue/throttle options, and maybe a simple admin UI page to monitor and adjust (which would go along with making it an app.)
[ ] Does not run within the Meteor app process.
[ ] Connects to the same MONGO_URL
[ ] Safe to run multiple instances of the worker app at once for scaling
[ ] Assume the temp store is using GridFS. Could add support for any accessible SA in the future, such as S3.
[ ] All dependencies rigged in a cfs imply package (no inter-package dependencies)
[ ] cfs:worker does not observe any fsCollections directly, it only interacts with the queue
[ ] ability to scale individual SA store copies
[ ] control over the job-collection cleanup schedule, default to remove on completion?

aldeed commented 10 years ago

Since we want to support both worker and no worker, maybe it's better to handle the logic directly in the temp store (instead of listening to upload event)?

We can have logic in temp store when upload is done:

If worker pkg added, call FS.FileWorker.addFile(fileObj).

Else call fileObj.createReadStream().pipe(fileObj.createWriteStream(storeName)) for each defined store.

raix commented 10 years ago

I was thing to have the collectionFS package have a small code stamp to hook up the worker to the tempStore events - We could have it route directly to SA if no worker package found? I'll push some code in a moment - hope it makes more sense to have the tempstore simply emit events - not caring about who is listening.

nkraich commented 10 years ago

Apologies if I don't quite understand the bigger picture here; I'll do my best!

From the above, it looks like it might now be possible to disable workers (which seem to be the cause of an error for me on meteor.com deployments.) I tried deleting the cfs-worker package from my app, but I got a bunch of "error building package" messages from the various cfs packages when I started Meteor.

I tried removing cfs-worker from the collectionFS package.js, and Meteor did start after removing it in that way, although files will not upload. It doesn't give any errors, but creates a file that links to /null.

Is it currently possible to disable the cfs-worker package? Sorry if I jumped the gun or misunderstood anything; just thought I'd give it a try.

For reference, my issue was https://github.com/CollectionFS/Meteor-CollectionFS/issues/209, the temp store "losing the file".

aldeed commented 10 years ago

No, you currently need the worker, but we plan to make it optional. @raix, is rewriting the worker and related code within the next week, I think, and after that, you should be able to remove that package.

nkraich commented 10 years ago

OK, thank you!

raix commented 10 years ago

I think your problem is related to missing rights to the filesystem (os temp folder) perhaps?

The concept of fileWorkers is:

Throttling cpu / io
scalablility - you could have some transform streams that could be heavy eg. data analyze or something - so we want to be able to handle uploaded files even if we are on multiple Meteor instances

@aldeed I'm on this - but I don't think it'll make it into the 1.0.0 Maybe we could:

have cfs work without file workers pr. default
have a minor release that allows TempStore to use any SA - e.g. GridFS, allowing scaling and systems where os temp folder is not accessible
have a minor release that complete the fileWorker for scalability - I like to have some tests and benchmarks on this

Its not api changing stuff - just changes the way cfs works internally

aldeed commented 10 years ago

Sounds good. If we at least remove the worker dependency for 1.0.0, that's fine with me. I might even be able to do that this weekend if you don't get to it.

nkraich commented 10 years ago

This would be a big help for me if you go in this direction. My assumption is that this would allow me to use CollectionFS on meteor.com again, which is a lot more important to me than scalability (right now, anyway).

Regardless, thank you very much for the explanation.

rhyslbw commented 9 years ago

I've made a start on the worker improvements. I may have missed some design considerations, but here is my proposal

Overview: New non-standard package cfs:job-manager creates a JobCollection, starts the jobServer, and loops over each FS._collections to register a listener on the relevant events such as tempStoreTransferComplete (renamed from uploadComplete). The listener functions loop over the file collection's stores creating a new Job for each with a type such as saveCopy. cfs:worker creates observers on the JobCollection, and triggers worker queues to pull jobs from the JobCollection. In cfs:collection we would now check for presence of FS.JobManager instead of FS.Worker to determine if worker will intermediate the tasks.

@raix @aldeed Questions:

Are you still intending to allow the store operations to be performed without the worker package? (directly in the collection)
Do we need to maintain the existing worker functionality of directly observing the collections to trigger actions?
Is "The transformWrite/transformRead should run async" still relevant in the proposed context? Right now in my fork the worker pulls a job from the JobCollection and then calls saveCopy

Issues:

Multi-instance support for queueing data removal jobs from stores cfs:worker currently observes removed messages on the cursor to run FS.TempStore.removeFile(fsFile), which would be run for each instance of the application. The only way I can think of managing this is to put a check in front of the new Job call to see if another instance has already created the job.
I'm assuming fsCollection will be extended to fire an allStoresComplete event since it's similar to the uploadComplete (renamed to tempStoreTransferComplete). This event triggers a removeTempFile job. Need to work out best way to track this.

raix commented 9 years ago

@rhyslbw deep questions - have to give it some thought before answering fully,

Maybe have collections try to register them selfs in the manager? the manager would be rigged in the main cfs imply package preventing direct inter package dependencies.

Short:

We had an idea about a super slim package combination - but the worker enables scalability and should also improve single instance setups (those would prop. be rare)

Ideally we should be able to have a pure app/ddp server, a worker pr. SA / copy in order to scale - eg. if one SA wants to make a copy of a file that require cpu heavy operations we should be able to isolate that worker as a separate setup / cluster / instance.

(maybe we should talk with @glasser / mdg about how this would fit in the control file - would be cool if it would allow us to define multiple independent server parts (workers) that could be scaled separately maybe automatically by galaxy without us having to think too much about it - eg. restpoints etc. depending on load)

Not sure it's needed - since the queue is pulling tasks from db?
We are using streams so thats as async as can be, not sure if it changes anything - the SA's are added to a FS.Collection is order - so the user would expect the first SA to be completed asap.

The issue you mention isn't an issue when instances find a task, tries to reserve it - if reserved then work on it. We rather want instances pulling from the queue rather than using the observer.

@rhyslbw thank you for your massive efforts on the cfs project, me and Eric is hard at work at the moment on a large scale meteor application, so short on time at the moment - really want to join in a bit more. The worker is the final big piece of the project, and will have a big impact so really excited about it :)

bitomule commented 9 years ago

I'm really interested in this change to support uploading from different meteor instances. Is there any way I can help with it?

I don't have enough experience with collectionFS to help with code but I can test changes.

BTW, great work.

rhyslbw commented 9 years ago

Thanks @raix, I appreciate your time on this, and I'm glad to hear the worker sub-project is where the effort is needed to get us to closer to 1.0. I understand your time is limited right now, but it would be helpful if you can do a review of the task list in the original post and at least ensure the items listed are not conflicting your current thoughts. We can then build on this as the master reference.

Maybe have collections try to register them selfs in the manager? the manager would be rigged in the main cfs imply package preventing direct inter package dependencies.

Not sure it's needed - since the queue is pulling tasks from db?

Ok great, so we can just add jobManager to the standard-packages imply as it will replace the existing worker observer implementation.

The issue you mention isn't an issue when instances find a task, tries to reserve it - if reserved then work on it. We rather want instances pulling from the queue rather than using the observer.

Sorry I didn't explain that too well. The manager would still have to be observing the fsCollection for removed messages in order to create a job, so the problem in that case would be a duplication of jobs in the queue when multiple instances all observe the remove. The check I referred to is in the manager

rhyslbw commented 9 years ago

Thanks @bitomule . Should have a WIP fork to share shortly, but if you're wanting to familiarise yourself with the code, take a look at the following cfs packages:

collection worker tempstore file

and these: raix:eventemitter vsivsi:job-collection

rhyslbw commented 9 years ago

@raix @aldeed if you get the chance can you take a look at this repo for the basic setup: https://github.com/rhyslbw/Meteor-CollectionFS

The first goal is to just get the default configuration implemented so we don't have workers observing fsCollections. Right now the saveCopy jobs are working but the two issues I raised above are now blocking progress:

Issues:

Multi-instance support for queueing data removal jobs from stores cfs:worker currently observes removed messages on the cursor to run FS.TempStore.removeFile(fsFile), which would be run for each instance of the application. The only way I can think of managing this is to put a check in front of the new Job call to see if another instance has already created the job.

I'm assuming fsCollection will be extended to fire an allStoresComplete event since it's similar to the uploadComplete (renamed to tempStoreTransferComplete). This event triggers a removeTempFile job. Need to work out best way to track this. need advice on how to handle triggering the creation of a removeTempFile job, since there is no existing event being emitted. The other outstanding issue

aldeed commented 9 years ago

@rhyslbw this all seems like a good start. I agree we need to figure out the multiple-store events issue.

In saveCopy, you can figure out success vs. failure by attaching on "error" and "end" listeners before piping.

Similarly, you can update FS.TempStore.removeFile to return tracker.remove({_id: chunkInfo._id}); and also return true if !chunkInfo because if the file doesn't exist I think we should just consider it a successful removal (though you could maybe do an FS.debug log along with it). That should allow you to determine success/fail for removal job.

rhyslbw commented 9 years ago

PR https://github.com/CollectionFS/Meteor-CollectionFS/pull/667 although I'm yet to make any changes since the feedback. Will be working on it again shortly

rhyslbw commented 9 years ago

@aldeed Just a quick update to let you know the PR has been updated with the changes you suggested

Meteor-Community-Packages / Meteor-CollectionFS

[1.2.0] Update the cfs-worker #214