bgewehr commented 9 years ago

Imagine a central alfresco instance at a headquarters location having millions of files in thousands of sites with 50 TB in size Imagine 20 branch offices connected by highspeed WAN (up to 1 Gbit/s) Imagine 1 linux server at each branch doing auth and other core stuff Imagine CmisSync on each of those servers which shall sync the content that is needed IN THAT BRANCH only Imagine 30 people per branch, so you would need all user accounts of that branch to be accessed on CMIS and each file to be synced that anyone of those 30 has access to (without duplicating them, for sure)

All this is for collaborative editing and creating a large number of files, which is the normal workflow in a civil engineering company.

Imagine someone editing one of those files. Will it be locked on the other sync instances to prevent conflicts? Will it be updated as a new version to Alfresco in short time?

Can such a scenario work? Do you have any experience ho CmisSync performs in a big scale?

Thank you for sharing your expertise and experience!

nicolas-raoul commented 9 years ago

Hello, Thanks for the interesting question!

Why only one CmisSync per branch? How will the 30 people of the branch collaborate with each other if only one computer has the content?

bgewehr commented 9 years ago

My idea is to use that Linux computer as the branch file server to avoid too many copies of the files.

nicolas-raoul commented 9 years ago

I see. What protocol and software will the 30 people use to access the branch server? NFS and NFS mounts? Will the 30 people perform check out/check in operations? If yes how? Or will they avoid conflicts another way?

bgewehr commented 9 years ago

Like a transparent file proxy cache service, mounted as a local network shared drive to each user.

nicolas-raoul commented 9 years ago

"local network shared drive" -> So you mean the CIFS protocol I guess?

bgewehr commented 9 years ago

Frontend a LAN file service for SMB/Cifs network shared drives (samba). Middleware a multiuser CMIS sync. Backend alfresco or whatever. SocialDrive!

bgewehr commented 9 years ago

Best locking would be "on demand" like usual in lan shared drives. One opened a file and it gets locked for all others immediately.

nicolas-raoul commented 9 years ago

SMB/CIFS can be quite unreliable, and in particular when it comes to locking or checkin/checkout. For instance, several people can easily overwrite each other when editing a SMB/CIFS-shared text file, because SMB/CIFS does not lock it in any way. Do you consider this as a problem or not?

bgewehr commented 9 years ago

Most of the applications care for the locking by creating temporary files ~$myfile.docx or similar. Would be enough for us since we work in a shared drive now and for a long time and users are used to care about that issue.

nicolas-raoul commented 9 years ago

I see! Indeed some applications like Word provide this kind of locking, so you can use it if you keep in mind that it might not work with other applications.

That was the main concern for me. The rest of your architecture sounds OK. 50 TB may take some time to synchronize, but it will work.

Obviously, all 30 persons at the branch will share the same Alfresco account, so you will not be able to set fine permissions (example: team leader can see the "Human resources - resumes" folder but other can not). Transfers to Alfresco will be sort of queued, for instance if someone modifies a big file and then I modify a small file, then my small file will have to wait until the big file is transferred.

The alternative solution would be to use one CmisSync per person (that would allow much finer permissions, and also allow reliable check in/check out). The drawback of using one CmisSync per person is that you need a much bigger server, and more space is taken on everyone's hard drives, of course.

lelmarir commented 9 years ago

Hi, That is my two cents.

IMO CmisSync is not suited for this scenario, as you want to always work "online", doing proper file locking in real time. CmisSync is an offline synchronizer, and should be used only when the user can't reach the central repository during regular work.

An Alfresco cluster (although i have no experience in this) might be a better solution: http://docs.alfresco.com/4.1/concepts/ha-intro.htm

bgewehr commented 9 years ago

Alfresco as a cluster really can do that job but it is not achievable in cost if you need 20 cluster members to deliver that solution. Way too expensive.

bgewehr commented 9 years ago

@Nicolas-raoul: that really is the main problem. We would need a solution in which the CMIS sync takes that 30 alfresco users for sync and produces only one file copy in the cifs share which is permitted to every user that has the permission in alfresco too, but not to the others. It's a sync file once and pull permissions from repository to the samba file permissions requirement.

nicolas-raoul commented 9 years ago

By the way, do your Alfresco and CIFS servers use a shared LDAP (or ActiveDirectory) for authentication?

bgewehr commented 9 years ago

Yes we use a samba 4 AD for SSO! Works great!

nicolas-raoul commented 9 years ago

In that case, with some programming effort CmisSync could indeed attempt to map the UNIX file owners and the CMIS document owners. It would probably work.

For permissions, the problem is a bit trickier. You will log CmisSync into a particular Alfresco user, and you will only see the documents that are available to that Alfresco user. If you log Cmis into the Alfresco admin user, you will see all documents, and to tell whether a document should be visible or not to a particular other Alfresco user, CmisSync will need to check the ACLs:

acl

CmisSync would need to access ActiveDirectory to tell what groups a particular user belongs to. CmisSync would also need to be told what something like {http://www.alfresco.org/model/site/1.0}site.SiteContributor means in terms of UNIX rights... this information is not retrievable using only the CMIS protocol, right?

bgewehr commented 9 years ago

One more thought: keep in mind that one branch only needs the files that are accessible for these people in that branch so there must be a certain ldap group filter for getting the locally needed files only.

bgewehr commented 9 years ago

If cmissync would enhance in that way you could call it a transparent file branch cache, something that does not yet exists! Local network performance combined with global collaboration in self organizing project groups!

aegif / CmisSync

Synchronize UNIX owners+permissions in SSO environment #636

socialdrive!