cccs-web / core

CCCS' customized django web application
4 stars 11 forks source link

deploy mayan to production and abadi #166

Closed cccs-ip closed 10 years ago

cccs-ip commented 10 years ago

We're happy with mayan and wish to start using it for document management. We want to establish a POT for data that doesn't risk getting wiped. Please help us to do this so that we can begin exploring how to link it into the rest of the site.

pwhipp commented 10 years ago

Mayan uses a simple flat file storage model layered over Django's existing storage classes.

By default, this uses a folder in the BASE_DIR called document_storage. Each file is identified by a UUID within this folder.

We are facing two issues here, storing the documents and integrating Mayan.

Storing Documents

Use the existing folder default

The default behavior is to store the documents under the 'document_storage' folder in the BASE_DIR. This is a reasonable policy but, as it stands, it means that the documents are not backed up and that their storage is a limited resource (currently shared with our databases and sites).

The lack of back up is a concern. We could script the document_storage backup alongside our (t.b.d) database backups.

Once backup policies are in place and implemented, we could probably use the document_storage folder for some time. I feel a little uneasy storing stuff here but it is common practice, particularly for user uploaded content.

Use Amazon S3

S3 offers inexpensive virtually unlimited storage which is automatically backed up.

This is supported by Django-storages as a backend and is thus in turn supported by Mayan so it should be possible to use S3 by installing the necessary existing apps and configuring the site appropriately.

The S3 storage can be secured to only serve documents through the site depending upon the security requirements.

We'd need an S3 bucket and location to store our documents in: http://martinbrochhaus.com/s3.html

Integrating Mayan into our existing sites

Mayan is a Django project, which is good but it has been built as a Django Project not as a Django app or suite of apps. It uses its own settings with a custom mapping to its own folder collection of apps.

It has not been designed to integrate as part of another website. It has been designed to stand alone as a Document Management Site.

To quote its creator Roberto Rosario:

Mayan EDMS on the whole has grown so much as a Django project that is it not feasable to develop it as a Django app to be integrated into another project, the functionality of several app (metadata, indexing, linking, tags) would have to be crammed into a single app.

Mayan does have a rest_api which looks fairly easy to extend. Therefore our best approach will probably be to have one or more Mayan DMS sites linked through URL references and API calls to other sites rather than attempting to integrate Mayan as a functional block into another site.

cccs-ip commented 10 years ago

I am fine with using S3 and can set up an account as appropriate.

I am a bit concerned about the integration issue, though. As we discussed, I am looking for a system that would allow visitors to our various websites some sort of interface to allow users to search and find documents with specific identifier tags (e.g. 'social policy' or 'PCPD'). If Mayan does not support such integration, then it does not answer our need.

Also, we will need to keep our documentation for different projects completely separate from one another--at least to the extent that if ever we need to print a reference index and create a ZIP bundle of project documents to re-supply back to our clients that such a process is easy. Does Mayan do that?

pwhipp commented 10 years ago

Mayan stores all of the documents for a particular site by id in a particular folder. You then create and use indices to access the documents. Creating a zip bundle from a specific index should be doable fairly easily if not already available.

Searching and finding documents should be supported through the Mayan API but I can't see a way to do this and the API looks to still be largely a todo list item unfortunately.

We can roll our own document management system quite quickly if your required feature set is small. I think I could build an extendable app in two days that would:

Mayan does a great deal that this simple app would not do but the above app could be extended to support things like indices, within document searching etc. but bear in mind that some of these extensions are significant undertakings (a generic within document search capability could take weeks of effort, for example).

cccs-ip commented 10 years ago

Thanks, Paul. I think you nailed it with the app: light, scalable, and ready relatively quickly. At the moment, we only need the basic functions you list. In-text searching could be cool, but i am more interested for the moment in assigning category labels in a flexible manner (like attribute tags) and creating a user interface for adjusting a few other elements (like BibTex identifiers).

In the future, I will want to link this in to some form of citation management tool, but such also may be a big project.

If we get to more advanced functions later (like in-text search). We can re-visit Mayan (maybe they'll have API integration by then) or otherwise measure the merits of additional effort.

I had sent you a bibliography file earlier for a client project. Please refer to it for an example of fields that are important for our document management purposes.

You can close this to acknowledge you've seen it. I'll open another issue to start the alternative reference system.

pwhipp commented 10 years ago

I'll aim to have the bibliography fields covered and will include a suitable export for LaTeX bibliographies.