Open cccs-ip opened 10 years ago
If you want to use the 'categories' (the old folder names) as tags there are problems:
The categories are uniquely named on the basis of their ancestry. There may be multiple sub categories with the same name as long as they have a different parent.
The category is very similar to a tag, save that it is an intentional hierarchy whereas tags are flat.
That said, I can tag all documents with a tag based upon category name(s), creating the new tag as necessary.
I think it's fine to leave the categories as is. If we can merge all the original file names to be stored across document with identical shasum, then I can start the process of eliminating records. What might be helpful / interesting would be a function asking if a specific file object appears anywhere outside of /alt-import/, and if it does, to delete all occurrences within 'alt import'. If a file object appears multiple times within /alt-import/ but not in the other top-level directories, then we would just leave the duplicates alone.
I can script that quite easily. The time consuming thing will be running the script. I've noticed that the sha generation has slowed the server because the process is using 75% of the available memory. I've left it running as is but have created what I hope is a less memory slurping version for future use.
Thanks, Paul. Fortunately we're not (yet) a high-traffic site.
The default user memory limit is set more for a desktop (4Gb I think - I need to check the ulimit defaults) which means that any user on the service can use all the memory. This is probably fine for now but it does mean that we have to be a little careful if running long processes like the one I set up. On a large (high traffic) server the limit would have kept things under better control.
folder names useful as attribute tags