cccs-web / core

CCCS' customized django web application
4 stars 11 forks source link

capture folder names as attribute tag #177

Open cccs-ip opened 9 years ago

cccs-ip commented 9 years ago

folder names useful as attribute tags

pwhipp commented 9 years ago

If you want to use the 'categories' (the old folder names) as tags there are problems:

The categories are uniquely named on the basis of their ancestry. There may be multiple sub categories with the same name as long as they have a different parent.

The category is very similar to a tag, save that it is an intentional hierarchy whereas tags are flat.

That said, I can tag all documents with a tag based upon category name(s), creating the new tag as necessary.

cccs-ip commented 9 years ago

I think it's fine to leave the categories as is. If we can merge all the original file names to be stored across document with identical shasum, then I can start the process of eliminating records. What might be helpful / interesting would be a function asking if a specific file object appears anywhere outside of /alt-import/, and if it does, to delete all occurrences within 'alt import'. If a file object appears multiple times within /alt-import/ but not in the other top-level directories, then we would just leave the duplicates alone.

pwhipp commented 9 years ago

I can script that quite easily. The time consuming thing will be running the script. I've noticed that the sha generation has slowed the server because the process is using 75% of the available memory. I've left it running as is but have created what I hope is a less memory slurping version for future use.

cccs-ip commented 9 years ago

Thanks, Paul. Fortunately we're not (yet) a high-traffic site.

pwhipp commented 9 years ago

The default user memory limit is set more for a desktop (4Gb I think - I need to check the ulimit defaults) which means that any user on the service can use all the memory. This is probably fine for now but it does mean that we have to be a little careful if running long processes like the one I set up. On a large (high traffic) server the limit would have kept things under better control.