Meteor-Community-Packages / Meteor-CollectionFS

Reactive file manager for Meteor
MIT License
1.05k stars 237 forks source link

How to store files on the server side? #29

Closed mitar closed 11 years ago

mitar commented 11 years ago

Is it possible to use CollectionFS on the server side? So that my code on the server downloads a file, puts it into the CollectionFS, which can then be downloaded/viewed by the client?

raix commented 11 years ago

Kind of like having a nat server doing a torrent?

I think it would make sense to have that feature - I think you could create a torrent file on client side an then have a filehandler parsing it and fetching + creating the files.

But it would be nice to have a storeRemoteFile for this - creating the file object and have a build-in functionality for this. After the file is loaded into the db it should be treated as a normal upload, having custom filehandlers running for creating cached / transformed versions of the file etc.

It's a nice idea - any thoughts on api would be great, might have a look at it some time this week, I'm thinking something like:

  var newFileId = ContactsFS.storeRemoteFile( url );
raix commented 11 years ago

Or.. do you speak of the serverside implementation of ContactsFS.storeFile? Would be implemented when doing the above - since it's allmost the same - It should take a Buffer or an url as parametre and store that in db

mitar commented 11 years ago

Huh, I do not know where you got torrents from. In my case, it is so that we can go around CORS issues where clients cannot access resources from other domains directly. So server can download this instead and then deliver it.

So in my case it is not even client who triggers the download. We prefetch things based on documents we have in our database. Then we download and store them. And when client comes, we deliver it locally.

I am not sure if it is so easy to implement storeRemoteFile. Because there could be many ways a file can be retrieved. For example, sometimes there have to be some HTTP headers set, or on the other times we have to access files stored in S3 bucket and get it locally. So I would just do storeFile(file) and leave to the caller to get the file.

mitar commented 11 years ago

Why are you writing ContactsFS and not CollectionFS?

raix commented 11 years ago

ContactsFS is just an example also used in the readme file, I wasn't sure if client triggered server download (some nat servers take a torrent file and goes fetching those files) But I can see that it's the server side version of storeFile you need, I'll try implementing it next week

mitar commented 11 years ago

Great! Thanks. So if I understand correctly, I will be able to store the file on the server side and then clients will get this in published files collection? I am yet unsure, but how do I then get content of this file on the client side? I would like to have it or as a buffer already or as an URL to which I can point ajax query, preferably both?

raix commented 11 years ago

yep, but you can write custom publish/subscribes.

There are two ways for the client to get the file:

 ContactsFS.retrieveBlob // loads file into blob from database

For more: https://github.com/raix/Meteor-CollectionFS#2-adding-the-controller-client-1

or you write a simple filehandler just caching the file on the server and then use the fileURL to set src or href in html.

Filesystem.fileHandlers({
  cached: function(options) {
    // NOP
    return { blob: options.blob, fileRecord: options.fileRecord };
  }
});
{{#each Files}}
<!-- -->
  {{#each fileHandler}}
    {{#constant}}
      <img src="{{url}}" alt="{{filename}}" width="20px"/>
    {{/constant}}
  {{else}}
    No cache
  {{/each}}
<!-- -->
{{/each}}
mitar commented 11 years ago

What does "caching" means? Where is that cached? I do not understand this because I would like that files are stored on the filesystem anyway?

mitar commented 11 years ago

So if I understand the schema, you are using DDP for both uploading and downloading the data? Hmm. I am a bit skeptical about this. I understand that for uploading this is nice. But for downloading it would be great if it would be possible to, for example, redirect to S3 so that client directly access file there, or that you can locally deliver through your HTTP server. For development DDP is probably great, but for production?

mitar commented 11 years ago

(Maybe I am just not used enough to this reactive nature of Meteor. :-) And maybe I wrongly want to do things traditionally. Please correct me.)

raix commented 11 years ago

I created the filehandlers to give extactly the option to handle the db files when they are uploaded eg. Saving / caching to the filesystem, uploading to other servers etc.

You Can create multiple filehandlers pr. CollectionFS, eg. if you want to create different image sizes, Sound formats etc.

When saving to the file system you actually use http to download (just like normal) the urls are placed in files.fileURL array of {path}

Filehandlers Can write to filesystem or just return a blob and let the system handle writing the file etc. It's described in the readme.

When a filehandler is done the file is updated and all clients are updated too, meteor handles this pretty cool,

Have a look at collectionFS.meteor.com (use chrome) try uploading a jpg. The 5 images are generated by 5 filehandlers all just making a filesystem version.

raix commented 11 years ago

Hi @mitar, I've created the serverside storeBuffer and retrieveBuffer - there are examples in the readme and in the filemanager example. Also checkout the new http://collectionFS.meteor.com - Added examples of drag&drop, server side create file, filters and more. Let me know if the work for you,

mitar commented 11 years ago

Great, will check it out soon!

mitar commented 11 years ago

Hm, would it be possible to have also storeStream? There is probably no need to create whole thing in the memory?

raix commented 11 years ago

@mitar I'm thinking about making a storeRemote() It would use streams for the job. It would have options to set auth, headers etc. And would be wrapped inside fibers/future to make it sync for Meteor. When the file is loaded it would trigger filehandlers if added, to do their thing

I've prepared it so that storeRemote could be triggered by the client, the client could set options: headers/auth too - the options would only be published to owner.

Guess a storeFile that reads from filesystem could be nice too

mitar commented 11 years ago

No, this is to limited. There are many many sources of streams, not just HTTP. For example, I could have a Bittorrent client. :-)

OK, in reality, I am using AWS SDK to retrieve an object from S3. So see, it is not really reasonable to support all possible sources of streams. Just make it compatible with writable stream. Something like:

stream.pipe(ContactsFS.storeStream('My stream file.txt', { 
  // Set a contentType (optional)
  contentType: 'text/plain',
  // Set a user id (optional)
  owner: 'WAaPHfyfgHGaeJ5kK',
  // Stop live update of progress (optional default to false)     
  noProgress: true,
  // Attach custom data to the file  
  metadata: { text: 'some stuff' }
}))
raix commented 11 years ago

Ahh, I get it, I'll have a look at it next week, got some deadlines this week. I'm thinking that the _id of the file might be important to some?

Maybe:

var newStreamFile = ContactsFS.storeStream('My stream file.txt', { 
  // Set a contentType (optional)
  contentType: 'text/plain',
  // Set a user id (optional)
  owner: 'WAaPHfyfgHGaeJ5kK',
  // Stop live update of progress (optional default to false)     
  noProgress: true,
  // Attach custom data to the file  
  metadata: { text: 'some stuff' }
});

// Returns { _id, stream }
console.log('Created file record file id: ' + newStreamFile._id);

// Get the data
stream.pipe(newStreamFile.stream);
mitar commented 11 years ago

Hm, in fact, the question is how well does streaming work with Meteor & Fibers.

But yes, something like this could also be done. And how could I do the progress bar?

raix commented 11 years ago

Do you know the size of stream? One of the things I want to look more into and be able to set chunk size in stream would be nice too But I guess leaving out noProgress: true should keep the fileRecord updated - no mather what it updates a complete. I'm also thinking in resumablity of a stream - if stream fails it should be possible to try again - without loading from the beginning - why I'm thinking about storeRemote to handle all this for you (having it retry, wrapped in sync) - But I'll have to think of a way

Maybe more like setting stream as a parametre with options of id and length - if no length then noProgress would default to true, if id then resume:

var id = ContactsFS.storeStream('My stream file.txt', stream, { 
  // Set id to resume
  id: fileId,
  // Set length
  length: filesize,
  // Allow it to run async
  callback: myCallback,  // Default would be sync in fibers
  // Set a contentType (optional)
  contentType: 'text/plain',
  // Set a user id (optional)
  owner: 'WAaPHfyfgHGaeJ5kK',
  // Stop live update of progress (optional default to false)     
  noProgress: true,
  // Attach custom data to the file  
  metadata: { text: 'some stuff' }
});
mitar commented 11 years ago

Yes, I can have length. At least in my case. So I get length in HTTP header and then I download.

BTW, I still don't know how can I store files only to the hard disk without GridFS.

raix commented 11 years ago

The filehandlers would do that for you, saving eg. to disk when run - they would handle the file when its ready and server got time for it.

All data is handled via the database, but one thing I've been thinking about is having the abillity to set an option for using the filesystem to store the file/ chunk data instead - I would still use the database when data comes from client data chunks might not be ordered - but a default filehandler could then save to disk and empty chunks and set a filesystem reference in db fileRecord.

raix commented 11 years ago

But data integrity wise the database is a good place for files, speedwise I dont think there much difference filesystem and databases, they are basicly the same. (would be nice with some benchmarks on it - db vs. filesystem) So when speaking about having it only on disk its only about saving storage.

mitar commented 11 years ago

Hm. Why to the database? If database is not on local machine, I do not really want data to be sent over the wire to some database.

No, filesystem has a nice advantage that you can use existing distributed/caching filesystems and other tools. It is also much easier to move them around, to cloud storages and so on. Furthermore, you do not have to have files stored twice, in the database and locally.

raix commented 11 years ago

true, why I made the filehandlers to exploid the existing infrastructure with caching etc. but the db makes it all possible, when you use the filehandlers the browser loads the file via http from the filesystem/cache.

But what I'm saying is, the problem could be solved - if one could do:

Declare a filehandler named 'master', then this could trigger that the chunks would be removed from the database - all other filehandlers would get handed the master file - not the database file.

The master could do all normal stuff, changing size of image (limit data usage on filesystem)

mitar commented 11 years ago

I must say that maybe I do not understand yet enough about this filehandler stuff. Could you create an example which stores data only on filesystem?

Anyway, also storing temporary files in the database is probably an overkill, sending things around the wire ... hmm.

raix commented 11 years ago

Basicly I made the filehandlers because I wanted a "cached" version of the file on my filesystem - I dont believe sending large data to client should be as ddp, rather http (ideally in this case ftp? since its optimized for file transport...)

When a file is uploaded in collectionFS its in the database.

actually in two collections suffix'ed .files and .chunks

.files holds info/fileRecord about the file, length, owner etc. and fileHandlers that are completed

.chunks holds the data sent from client and makes resume possible, even multiple user could upload same file at the same time from different locations.(not implemented, cant find a usecase for that...) chunks are not available on the client, they are saved/loaded via method ressemles ajax?

When the upload is complete and filehandlers are defined it puts file in a queue on the server

When server has time it runs one filehandler at a time, filehandler that are applied to the collection (these are custom functions you have to define - the readme and example shows how)

If one of these functions failes (returns false) it retries 3 times and goes to the next task or filehandler. If server gets bored it sleeps a period, then crawls to se if any new filehandlers are defined, and it tries to run failed filehandlers again, thinking that they could rely on a remote server and connection

If a filehandler returns a buffer and filerecord then the server handles the buffer by saving it to the filesystem and updates details in fileRecord. (your file is now on the filesystem and the fileRecords fileHandler holds info about it eg. url and extension pr. fileHandler)

The filehandler is passed an argument options it holds a buffer (maybe a stream in future for memory?) a fileRecord and a helper function destination where you can set an optional extension and gets returned serverFilename and fileData.url + fileData.extension

But have a look at the example, shows how to get urls into the templates, it's all reactive why the db is nice.. so the files would appear in real time as filehandlers are working through (try pressing the reset filehandlers in the example to se this in action)

Well I kinda agree, but its a complicated pro/cons - I weight flexibillity/security, so it makes life easier for me - Depending on the setup some systems have dedicated fileservers - here data goes over the wire too when server handles files?

raix commented 11 years ago

Update... If filehandler dont return a blob it could be because:

  1. the filehandler wrote to the filesystem it self (hopefully using the destination to get a safe place) - so it returns only the fileData (an url and extension) - the server updates the filerecord with this
  2. The filehandler sendt the data to a remote server, it could return the url and extension on the remote location - the client wouldn't care what server the url points at.
  3. The filehandler could parse the uploaded file and maybe perform some actions, based on the actual contents of the file. It could return any object that should be saved in the files.fileHandler.myfilehandler or just null - the server would save result and not rerun.
mitar commented 11 years ago

Vau, thanks for this explanation!

OK, but then data is stored twice? In the database and processed on the disk?

And no, FTP is not really much more optimized for files than HTTP. But it is an old protocol, requires open ports on client in original specification.

raix commented 11 years ago

yep, at the moment - but I've made an issue #34 for making db temp - delete the chunks in db when a master file is generated.

raix commented 11 years ago

I'm closing this look at #77 Some of the new stuff are full http rest api, http fileserver and storage handlers for:

This way one can serve a storage handler directly to a fileserver point, have filehandlers save into some storage handler making it a very flexible setup.

All results in smaller reusable packages

mitar commented 11 years ago

Vau. Great! Will check it out.

nspangler commented 10 years ago

@raix and @mitar along the same lines, I have a Uint8Array containing the data for a png image that I convert to a base64Srting. Is there any way to store this in CollectionFS? More clearly could CollectionFS use a data string such as this: 'data:image/png;base64,' + base64String + ''", where this string is a url of the stored image? Hopefully that makes sense, if not I can explain myself a little more. Thanks for your thoughts.

raix commented 10 years ago

@nspangler is it client side and are you on the old cfs or new devel-merge branch?

nspangler commented 10 years ago

@raix it is client side and the old cfs.

raix commented 10 years ago

I think it would be better to use the devel-merge or wait a week or two, think we'll push v2 out. There are ways to convert to/from base64 and blob. Some use canvas object for converting. We could perhaps make the v2 accept base64 data / data urls - as we should also beable to get a file as a data url. Should prop add this as a seperate issue?

aldeed commented 10 years ago

@raix, one option for adding support in devel-merge would be to update fsFile.setDataFromUrl() so that if the URL begins with "data:", then load data from the data string. However, it seems like it would be rare that you would have a base64 string but not also have the binary/arraybuffer/blob, which you could insert directly. So I don't know if it's worth the effort.

nspangler commented 10 years ago

@raix and @aldeed I agree with what aldeed is saying. I have decided to go a different route. I have a stored a file from an objective c application to my collectionfs meteor mongodb collection. The file saved is both contactsFS.chunks and contactsFS.files. At first it was erroring out meteor due to the length parameter, however I changed the length name so it would not error out. However now the file cannot be accessed by CollectionFS. This is because the file has never gone through a filehandler. How would I make that file inserted into the collection from an outside source go through the file handlers. I have tried using the ContactsFS.retrieveBlob() then reinsert it back into the collection but that has been an ill attempt. Right now I have the raw image sitting in my contactsFS (see screen shot below:) screen shot 2013-12-20 at 12 15 16 pm

raix commented 10 years ago

I'll have to look deeper into this - length should be converted into string - Meteor's usage of underscore is causing this issue #594 in Meteor (only issue marked "confirmed")

nspangler commented 10 years ago

@raix thanks. Is there a way to access the contacts.chunks in meteor. ContactsFs.find() only returns the contacts.files. If I can access the .chunks then I can use the BinData and build an image off of that.

raix commented 10 years ago

Remember to set encoding to binary

iliaznk commented 10 years ago

Hi, guys! I'm very new to Meteor and trying to use CollectionFS package, but when I'm trying to define a file handler like so:

Filesystem.fileHandlers({
    default1: function(options) { // Options contains blob and fileRecord — same is expected in return if should be saved on filesytem, can be modified
        console.log('I am handling default1: ' + options.fileRecord.filename);
        return { blob: options.blob, fileRecord: options.fileRecord }; // if no blob then save result in fileHandle (added createdAt)
    }
});

I'm getting an error: ReferenceError: Filesystem is not defined

Could you please help me figure out what the problem is. Thanks!

iliaznk commented 10 years ago

Oops, I got it! Stupid me :) It should be myFileSystem name instead of Filesystem.

nooitaf commented 10 years ago

@iliaznk check out this example.. https://github.com/mxab/cfs-multi-filehandler-example

it should be yourCollectionName.fileHandlers({...

iliaznk commented 10 years ago

@nooitaf yes, that's what I meant, thank you! I just have one more question: how do I specify where to save the file on the server?

nooitaf commented 10 years ago

with cfs v1 you cant.. they get stored in .meteor/local/cfs or when deployed in ../cfs

iliaznk commented 10 years ago

Ok, then, is there any way to serve the saved files (images in particular) as urls for img src="'?

raix commented 10 years ago

@iliaznk https://github.com/mxab/cfs-multi-filehandler-example/blob/master/app.html#L54 I'm getting a bit rusty on the V1 api, why I link to the code, hope it's ok,

iliaznk commented 10 years ago

@raix thank you! But I've tried that before and it's not working, images won't show up. Probably because Meteor does not serve any static files outside of the Public dir?

raix commented 10 years ago

Should work, @nooitaf made a cfs-public-folder that v1 depends on, check if files are stored on the file system as @nooitaf mentioned.

Btw. Merry Christmas y'all :)

iliaznk commented 10 years ago

@raix thank you! Merry Christmas to you too!

Something weird is going on. I've seen the files stored there before. Then I reset the app and the dir was gone. Then I uploaded a couple of files but the dir still wasn't there but a file named as the path to the dir appeared with a lock symbol at the beginning.

Then I reset the app again, uploaded files, now I got the dir, but when I go there it's not showing anything, not like an empty dir would look like, but as though it's not letting me in and hiding the contents, you know... Sorry, guys, I'm not a linux pro yet, hopefully I made clear enough.