couchDB button and upload attachments

lpatiny commented 9 years ago

It is important that the test cases and even some "production" tools are self-contained. Because some of the testcases require big data set (like the distance matrix of IR spectra) it would be nice to be able to attach directly to a view. This would allow for example to create small database of reference molecule (project of Cyril) or demonstration of the NMR 1D auto-assignment (project from Andres). Would be nice that when you click on a specific view there is a button next to "Make public" that opens a dialog that allow to upload / rename / delete files. Should we consider folders (consequence for couch and URL to retrieve the document) ? It should be feasible because we used them when we were adding the visualizer directly in couchDB. What kind of project could we do with this kind of approach ? Infra-red spectra are around 100kB in jcamp format. If we should consider having 1000 attachments and that the system works correctly with 1000 attachments. This means the document with the attachment would have around 100Mb. We had some trouble with couchDB when we were adding the visualizer directly in the database but we still had more documents in this case.

stropitek commented 9 years ago

I don't think you can rename attachments in couchdb. You would have to re-upload with different name.

targos commented 9 years ago

to prevent overwriting the view/data.json, prefix the name with "upload/"

stropitek commented 9 years ago

1c2d00f2ea174b95f08b8416d7b29fd2be328c91 http://www.lactame.com/visualizer/doc/couchdbAttachments.html Basically, the api is list, get, upload, remove. There's a in-memory cache such as to not re-download. Is that what you expected @lpatiny ? Should we bother to cache into indexed-db?

lpatiny commented 9 years ago

Cache would be great of course. Could we reuse the system that Norman did ? We should also check possible problem of the size that it takes in memory ! At least for all the attachments we have a MD5 so it is easy to find out if it is cached or not. As for indexed-db it could be useful but not priority. But I'm pretty sure you will think differently in August when you are in Colombia ;)

stropitek commented 9 years ago

Norman's system writes both in memory and in indexed-db, and it looks easy to use. For the memory problem do you mean a lot of files that would exceed allowed memory allocation in total or do you refer to the limit we would hit with one very big file. Do you have a test case in couchdb (pdb?)

stropitek commented 9 years ago

There's also the question of how to handle conflicts... Should we give the choice of what to do or should we force overwrite?

lpatiny commented 9 years ago

when you upload you replace if it is the same name so we don’t handle conflict.

When we want to process the data we get the list of attachments with md5 and the cache will just be based on this md5

I don’t see where conflict could be because we will never save locally before synchronisation no ?

On Jun 8, 2015, at 10:17 AM, Daniel Kostro notifications@github.com wrote:

There's also the question of how to handle conflicts... Should we give the choice of what to do or should we force overwrite?

— Reply to this email directly or view it on GitHub https://github.com/NPellet/visualizer/issues/529#issuecomment-110035883.

lpatiny commented 9 years ago

The attachments is a nice feature but we will still need to take care with it ... If we have 2D nmr each file may be 10Mb and if we have 200 of them ... this is a problem for the in-memory storage of the data. So I guess we should restrict ourself to reasonable data set, typically a set of 500 files of 50 kb. For more complex dataset we could think about a system based on Michael synchronisation ? Or maybe give the possibility to the enduser to create his own databases. We will see.

stropitek commented 9 years ago

Norman's library enables to have limit configuration on both memory and DB storage. But the limit is in number of items not in number of bytes, so it makes it a bit difficult to use.

By conflicts I meant conflicts in couchdb because two tabs are open, two versions of the same file are uploaded but one of the tabs does not have the latest revision count. But I think we can just get the latest revision id on conflict, re-upload and we'll be fine.

lpatiny commented 9 years ago

The problem was to find the size of the object in memory. There was no simple way to do this I thnk

This is why currently we should consider “simple” cases and indeed always take last revision.

On Jun 8, 2015, at 10:35 AM, Daniel Kostro notifications@github.com wrote:

Norman's library enables to have limit configuration on both memory and DB storage. But the limit is in number of items not in number of bytes, so it makes it a bit difficult to use.

By conflicts I meant conflicts in couchdb because two tabs are open, two versions of the same file are uploaded but one of the tabs does not have the latest revision count. But I think we can just get the latest revision id on conflict, re-upload and we'll be fine.

— Reply to this email directly or view it on GitHub https://github.com/NPellet/visualizer/issues/529#issuecomment-110046414.

targos commented 9 years ago

we don't need tu use the MD5, filename is unique per document

targos commented 9 years ago

let's not check the filesize for now, it could bring edge cases where the cache is always being rewritten because a single file is taking all the place

lpatiny commented 9 years ago

It is to check if the cache is up-to-date. Seems the simplest way

On Jun 8, 2015, at 10:48 AM, Michaël Zasso notifications@github.com wrote:

we don't need tu use the MD5, filename is unique per document

— Reply to this email directly or view it on GitHub https://github.com/NPellet/visualizer/issues/529#issuecomment-110050466.

targos commented 9 years ago

then Norman's util is not what we need, it is a different kind of cache

lpatiny commented 9 years ago

It is a key / value cache no ? So the key is the MD5

targos commented 9 years ago

Good point, but there will be dead data in the cache with this approach. We need to find a way to clean it

lpatiny commented 9 years ago

Because it is a first in first out approach the unused data will disappear

As far as I know the in memory and in indexDB both are limited in the number of files

On Jun 8, 2015, at 10:58 AM, Michaël Zasso notifications@github.com wrote:

Good point, but there will be dead data in the cache was this approach. We need to find a way to clean it

— Reply to this email directly or view it on GitHub https://github.com/NPellet/visualizer/issues/529#issuecomment-110052776.

targos commented 9 years ago

it's not a first in first out but that doesn't matter. The unused data can disappear, but it depends on how we configure the cache limit and how we use it.

If we have the same cache for the entire attachment API, we will have issues in case of multiple big datasets: as soon as the cache limit is hit, each time you load a dataset, the oldest one will be removed.
If we have a different cache for each couchDB document, we risk to introduce a lot of dead data

There is no ideal solution...

stropitek commented 9 years ago

So what we need is a cache with an expiration date. The cache would check for last read date before each api call and clean what has been unused for a while.

stropitek commented 9 years ago

I will use Norman's LRU for now, we can implement another cache with the same interface later

stropitek commented 8 years ago

Not using LRU anymore. If you want to cache the result you can store in yourself in IDB using src/util/IDBKeyValue.js

NPellet / visualizer

couchDB button and upload attachments #529