cccs-web / core

CCCS' customized django web application
4 stars 11 forks source link

options for document management #75

Open cccs-ip opened 10 years ago

cccs-ip commented 10 years ago

putting this up as one idea for a document management system:

cccs-ip commented 10 years ago

Paul, you noted:

The Mayan documentation is polished and the code base looks good. I've forked it and pushed some minor updates so we can deploy it to multiple environments etc.
Unfortunately it is missing a critical module (signaler) and I cannot see any way to obtain it. I've posted a request for help and am waiting for feedback.

Thanks for this. Let's deploy to 'staging' (when possible) to test it out.

pwhipp commented 9 years ago

mayan.crossculturalconsult.com is up and running with some basic test data.

The code quality is good and it looks full featured. It looks like the most comprehensive and powerful system available.

Over to you to decide where you want to take this now.

cccs-ip commented 9 years ago

Referencing Issue #166

We determined Mayan may not be the best fit for our current needs. Please go forward with this proposal:

We can roll our own document management system quite quickly if your required feature set is small. I think I could build an extendable app in two days that would:

  • Integrate with any django site providing a secure isolated repository of uploaded documents (S3 would still be an option)
  • Allow admin users to upload new documents and associate metadata with those documents including
    • tags
    • arbitrary pre_specified fields (title, creator, description...)
    • categories
  • Allow searching using arbitrary metadata
  • Allow browsing by category or tag
  • Generate a complete zip bundle
pwhipp commented 9 years ago

This will also include a bibliography tool.

pwhipp commented 9 years ago

There is a basic document manager in place on staging. I've uploaded one test document to it.

Currently it has no security - I can add a 'registered users only' to download as an option and/or make things more sophisticated by getting the uploader/admin to select a user group such that only users in the selected group will be able to download the document.

I'll proceed to add the categories and bib stuff and the simple user registration security but you might want to have a play and make sure it is headed in the right direction.

pwhipp commented 9 years ago

The abadi application needs secure access to s3://abadi-docs to read and write documents and add/remove subfolders. The abadi application may support many users and may further restrict these users in terms of their access to s3://abadi-docs through the application. Ditto for cccs

To test this I'm using cccs-docs and the staging document manager.

AWS is an evolving system of coupled services. The first point of access is the AWS account. This is managed by a single user and is the root or superuser. In the past this was all AWS had. IAM extends this by providing access control for a collection of entities associated with the AWS account.

Within IAM you have Users, Groups and Roles A role is a collection of permissions (its permission policy) and some rules about who or what can assume the role (its trust policy). A user is an entity with some specified way to interact with the AWS services (it has an access key id, a secret access key, its permissions and some metadata) A group is a collection of users. Users can be in more than one group.

Roles and users are alternative things - AWS is confused here. An EC2 instance can only have one role so if we have multiple applications on a single EC2 instance with differing access requirements, we cannot use roles.

Therefore we have to have a user and we need to store that user's 'access key' and 'secret access key' in our secrets.py configurations.

You've already created the abadi-docs and cccs-docs users so we use those. First we need to associate a policy with each user that grants the necessary access to the s3 buckets. This is done through the IAM service. To do it, I created a full s3 access policy and then set the resource appropriately by editing it.

For the document manager, the storage is set to S3 and the necessary access key and credentials specified in the settings (in secrets.py so not available on the repository for obvious reasons). This allows the documents to be stored to and retrieved from S3.

Please have a play with this and let me know if I'm on the right track. Categories and security to come when you assign it back to me.

pwhipp commented 9 years ago

For Categories we want to be able to

A document may appear in more than one category

What is the difference between tags and categories? They are both taxonomies. Categories are hierarchical whereas tags are not. The difference really comes down to use: Tags are arbitrary 'relevance' labels. When a document is tagged, the user is saying that this document is relevant to 'X'. Tags are 'fast and light' and large numbers may be associated with a large document. Categories are classifications that are intended to apply to the whole document. This document is about X where X is a category.

cccs-ip commented 9 years ago

Thanks, Paul. Most of our contributions would be tags. For categories, perhaps the only absolute is the document 'type'? A given file will only be one type of document. Otherwise it can be 'about' multiple categories.

Are multiple categories possible? If so, then I guess my question is what is benefit is of assigning categories rather than just tags?

pwhipp commented 9 years ago

Categories are hierarchical whereas tags are flat.

The categorization system can have multiple root categories and items can be placed at any level so it can represent a system of folders and files for example, where the categories are the folder names. This was how I set up the default categories. Unlike folders and files, document does not 'live' under a particular category in any real sense - it can be referenced under as many categories as you like.

Tags, on the other hand are literally just strings associated with documents (or other types). They do not have a namespace so there is only one global list of tags.

In this sense, a categorization system is richer than a tagging system but it is also more complex to work with and set up.

Both make sense for docmeta. If I had to do without one, I'd do without tags because I can emulate tags simply by having a root category called 'tag' and having all the tags as subcategories of tag.