cbfs question - Githubissues

tleyden commented 9 years ago

Hey I'm trying to run cbfs and write a blog post so I can remember this later.. sorry if this is the wrong place to post, but I wanted to keep it out of email and didn't think there was a google group / mailing list for cbfs.

I'm looking at:

https://github.com/couchbaselabs/cbfs#how-do-i-run-the-stuff

and had a few questions:

-nodeID=$mynodeid

What is an example of a good node id? Any limitations on what I can use?

-bucket=cbfs

So does that assume I've already created a bucket on couchbase server, or will it create it on its own? Anything I should know about creating that bucket? (limitations, suggested settings)

-viewProxy

Just curious .. what does this do?

tleyden commented 9 years ago

Looks like it expects the bucket to be there:

2014/11/14 16:46:11 Connecting to couchbase bucket cbfs at http://10.187.52.156:8091/
2014/11/14 16:46:11 Can't connect to couchbase: No bucket named cbfs

mschoch commented 9 years ago

Sorry for slow reply, using phone at airport.

Bucket must be there, go-couchbase has no bucket creation capability.

Node ID must be unique, usually meaningful like hostname.

View proxy option let's you access Couchbase views under a URL namespace. This can be used to by html apps deployed into cbfs to access some things they otherwise couldn't. I don't remember if the admin app relies on this or if we built dedicated handlers for the things we needed.

tleyden commented 9 years ago

Thanks, that helps. Not sure if I fully grok the view proxy option, but I'll keep it in the back of my mind.

andrewwebber commented 9 years ago

I am passionate about CBFS and have been using it extensively. It is such a shame that few get it or have extended to become what it could be. (A competitor to glusterfs and ceph).

I wanted to share some of my experiences with you just in case your having problems getting started.

First we use an extended dustin/couchbase docker container that boots up couchbase and creates an additional cbfs bucket on initial install.

I currently deploy CBFS nodes on coreos. Here is a link to a working cloud-config file

Essentially this systemd unit that connects to a known couchbase server on VM boot. I have not refactored this to use etcd for service discovery to find any couchbase server instead of a fixed IP that could go down.

If i need more storage i just boot new VMs using iPXE and the above cloud config

        [Unit]
        After=setup-network-environment.service
        ConditionFileIsExecutable=/opt/bin/cbfs.linux
        Description=Couchbase File System Server
        Documentation=http://andrew.webber@brainloop.com

        [Service]
        EnvironmentFile=/etc/network-environment
        ExecStart=/opt/bin/cbfs.linux \
        --couchbase=http://data1.qmirm.test2:8091 \
        --bucket=cbfs \
        --nodeID=${DEFAULT_IPV4} \
        --verbose
        Restart=on-failure
        RestartSec=1

I use the IP address of the VM as the nodeID

I also having the following optional systemd file to upload the cbfs monitor for the first time

        [Unit]
        After=cbfs.service
        Description=Couchbase Storage Monitor
        Documentation=http://gitlab.qmirm.test2

        [Service]
        ExecStart= /usr/bin/docker run -i -t --rm --entrypoint /cbfs/cbfsclient.linux andrewwebber/cbfs http://172.20.11.49:8484 upload /cbfs/monitor monitor
        RemainAfterExit=yes
        Type=oneshot

        [Install]
        WantedBy=multi-user.target

This docker container containers the upload client as well as the monitor and uploads it an inital storage node

The only reason within internal discussions I have lost making the case for CBFS are the following:

Couchbase does not directly support this product, simlar to the fact that it does not support go-couchbase
CBFS is not data center aware, one would simply need to write an agent onto of CBFS that looks at the statistics of data across the nodes and requires nodes in another data center to 'fetch' data not minimally replicated across nodes in the second data center
People cant believe that the deployment is just downloading a go compiled program and running it. No additional complex install with Red Hat consulting.

kind regards,

Andrew

mschoch commented 9 years ago

So in cbfs you can attach data to your files. (Think couchapp in reverse, instead of attach file to documents, we can attach a document to the file). Let's say we put EXIF data in json attached to file. Now we build couchase view that uses this json. Now we upload an HTML/js app that invokes this view (using view proxy) and renders view output in some meaningful way.

tleyden commented 9 years ago

@mschoch "So in cbfs you can attach data to your files". Ah, that makes sense. Thanks!

tleyden commented 9 years ago

simlar to the fact that it does not support go-couchbase

I believe that is changing. There are efforts underway to make go-couchbase an official SDK.

Couchbase does not directly support this product,

Not yet. With Couchbase Mobile, we need a better attachment storage story than we currently have, which has an awkward limit of 20MB. Having Sync Gateway be able to store data into cbfs seems like the logical choice.

tleyden commented 9 years ago

@andrewwebber thanks a lot for posting your scripts. Btw I noticed you weren't using docker volumes. When I researched this recently, the docker folks told me that for data intensive things, docker volumes should be used rather than the container filesystem.

In the setup I used in my blog post, here is where I created the volume:

https://github.com/tleyden/couchbase-server-coreos/blob/master/2.2/fleet/coreos-stable-pv.template#L170-L179

and here is where I told the docker container to mount it:

https://github.com/tleyden/couchbase-server-coreos/blob/master/2.2/fleet/couchbase_node.service.template#L9

HTH

andrewwebber commented 9 years ago

@tleyden with respect to docker volumes our current strategy is to have the coreos host mount a NFS and we use the volume flag to have the couchbase container write to the NFS via the host.

If i was working at couchbase i would definatly make noise about investing in cbfs. After talking to some couchbase sales and tech guys they indicated that little is know about who is using cbfs and therefore there are no majoy commitments. This then made it difficult for me to make the case for cbfs in my department and now we are investing in glusterfs (Red hat storage).

of course cbfs does not offer a mount client, which isnt required of course if your architecture is happy with a simple amazon S3 (cbfs) like rest api. However some people feel more comfortable having a mount like experience.

if cbfs was officially support (the code base isn't that complicated) and has enhancements like XDCR then it could be a great competitor to for businesses like mine that cant use public clouds due to the patriot act. We have to choose a private cloud option and at the moment mostly of these are open stack vendors.

cbfs provides and elegant alternative.

tleyden commented 9 years ago

with respect to docker volumes our current strategy is to have the coreos host mount a NFS and we use the volume flag to have the couchbase container write to the NFS via the host.

@andrewwebber ah gotcha, so you are way ahead of me! OK cool, was just giving you a heads up just in case.

Thanks for the heads up regarding your use case. I can't make any promises other than making noise about it within the org.

andrewwebber commented 9 years ago

@tleyden totally cool, im still learning with docker, volumes and couchbase. Of course im not even sure if running couchbase within a docker container is supported but on our project we are moving in this direction anyway because of CoreOS (Project Atomic - if management get there way)

tleyden commented 9 years ago

I use the IP address of the VM as the nodeID

I just spent quite a bit of time on a bug where I accidentally passed in a blank NodeID on the command line to start cbfs, which caused it to generate it's own nodeid:

Dec 23 01:19:58 core-02 docker[8451]: 2014/12/23 01:19:58 NodeID was not given, generating one
Dec 23 01:19:58 core-02 docker[8451]: 2014/12/23 01:19:58 serverID: e18bd673

but the problem is that it did not seem to have valid ip addresses for nodes, and so cbfs nodes didn't seem to be able to see each other.

The symptoms were:

Files were not getting replicated between the nodes
I was seeing messages on the cbfs console like:

Asking {StorageNode a82fd58c/127.0.0.1} to acquire cfded77803b08ca235458d95bf2e56fe43029046

Meaning that it seemed to know another node called a82fd58c existed, but it was trying to reach it via 127.0.0.1 rather than an IP address.

I'm mentioning this because I thought maybe the documentation should be updated with this gotcha, or a more prominent warning should be printed when it ends up using 127.0.0.1 for node addresses.

dustin commented 9 years ago

You don't have to specify a node ID -- as you see, it'll just make one up if you don't provide one. This is probably what you want if you have a largeish number of roughly identical, disposable machines. If you care about them individually, you can go ahead and name them. In either case, the metadata will help you understand how to resolve them, except...

If you don't specify a bind address and the couchbase server is on the same box as the cbfs node, then it won't know how to advertise itself other than 127.1 (which is the local address of the connection it makes to couchbase).

This is the result of a question that lots of people ask and sounds really easy: "What is my IP address?" Every host connected to the internet has at least two. The majority these days have a lot more than that. A service running on a docker instance typically has the addresses it can listen on which are separate from the addresses on which you can contact the service (which is the same for any NATted service). It's not an easy thing to deal with.

The further complication is that 127.1, while totally a valid address for the cbfs instance from the cbfs instance, it's also an address on every individual node.

So it's a little complicated. I think what you'd really want for docker type installs is a separate address you advertise that is different from both the one you bind to and the one you connect to couchbase with.

tleyden commented 9 years ago

A service running on a docker instance typically has the addresses it can listen on which are separate from the addresses on which you can contact the service (which is the same for any NATted service). It's not an easy thing to deal with.

I'm using --net=host, and so the docker instance basically melds into the host OS in terms of networking, which avoids all the NAT related headaches. So far I haven't run into any issues with --net=host, because I'm only running a single docker instance for cbfs on each host OS.

If you don't specify a bind address and the couchbase server is on the same box as the cbfs node, then it won't know how to advertise itself other than 127.1 (which is the local address of the connection it makes to couchbase).

Yes, that's what I was running into. I'll plan to switch over to letting it generate a nodeID, and specifying the BindAddress as {private ip}:8484

it won't know how to advertise itself other than 127.1 (which is the local address of the connection it makes to couchbase).

Ah, so if I had passed in an address like http://{private ip}:8091 to the -couchbase argument, it would have advertised itself as {private ip}? Due to the same bug in my bash script, I was passing in http://:8484 (no ip whatsoever) in the -couchbase argument.

couchbaselabs / cbfs

cbfs question #132