JackSlateur / bareos-libcloud

Backup object storages via Bareos (Amazon's s3, Ceph's RGW, Google's GCS etc)
GNU Affero General Public License v3.0
11 stars 1 forks source link

Plugin exits without error on exception #2

Closed ghost closed 5 years ago

ghost commented 5 years ago

Hi,

we testes this (great!) plugin on Azure and got a mistake with Blob storage configuration. But we also saw that bareos job was ended withaout to show any error to us. In syslog on backup client we got exception:

Jan 25 13:56:49 backup-host BareosFdPluginLibcloud: [32288] Traceback (most recent call last):
Jan 25 13:56:49 backup-host BareosFdPluginLibcloud: [32288]   File "/usr/lib/bareos/plugins/BareosFdPluginLibcloud.py", line 156, in __call__
Jan 25 13:56:49 backup-host BareosFdPluginLibcloud: [32288]     self.__map()
Jan 25 13:56:49 backup-host BareosFdPluginLibcloud: [32288]   File "/usr/lib/bareos/plugins/BareosFdPluginLibcloud.py", line 162, in __map
Jan 25 13:56:49 backup-host BareosFdPluginLibcloud: [32288]     for bucket in self.driver.iterate_containers():
Jan 25 13:56:49 backup-host BareosFdPluginLibcloud: [32288]   File "/usr/lib/python2.7/dist-packages/libcloud/storage/drivers/azure_blobs.py", line 379, in iterate_containers
Jan 25 13:56:49 backup-host BareosFdPluginLibcloud: [32288]     (response.status), driver=self)
Jan 25 13:56:49 backup-host BareosFdPluginLibcloud: [32288] LibcloudError: <LibcloudError in <libcloud.storage.drivers.azure_blobs.AzureBlobsStorageDriver object at 0x7f87ed2da790> 'Unexpected status code: 400'>
Jan 25 13:56:49 backup-host BareosFdPluginLibcloud: [32288]

Probably missing try: constroct in _map function?

Dimitrij

JackSlateur commented 5 years ago

Did you finally got the plugin working on Azure ?

There is indeed a bug, the plugin does not crash if the backend cannot be reached (due to wrong tokens, settings or network issue) I do not know how to fix it Well, I do have a dummy patch, but the performance impact is unknown to me

If you have more than a simple testbed, would you test that patch ? https://raw.githubusercontent.com/JackSlateur/bareos-libcloud/test_connect/BareosFdPluginLibcloud.py

Thanks

ghost commented 5 years ago

We tested it on Azure and with right definition from storage provider in Auzure it works like a charm. Thanks for great work! Sure, I'll test you patch and provide feedback.

Dimitrij

ghost commented 5 years ago

Results from (not succeeded) test: good new: plugin crashes:

25-Jan 15:18 backup-host JobId 172771: Fatal error: python-fd: Traceback (most recent call last):
  File "/usr/lib/bareos/plugins/BareosFdWrapper.py", line 34, in parse_plugin_definition
    return bareos_fd_plugin_object.parse_plugin_definition(context, plugindef)
  File "/usr/lib/bareos/plugins/BareosFdPluginLibcloud.py", line 317, in parse_plugin_definition
    driver = connect(self.options)
  File "/usr/lib/bareos/plugins/BareosFdPluginLibcloud.py", line 84, in connect
    for opt in ('buckets_exclude', 'accurate', 'nb_prefetcher', 'prefetch_size', 'queue_size', 'provider', 'buckets_include', 'debug'):
TypeError: an integer is required

bad news: it crashes on configuration worked before.

Dimitrij

JackSlateur commented 5 years ago

Would you mind re testing with the updated file ?

Thanks

ghost commented 5 years ago

I tested with file from 'master' branch. here we are able to create backup, but on wrong blob storage definition we got no errors. Just a backup with 0 files. I tested with file from 'test_connect' branch and got error i wrote before in bacdirector.

If you have any new updates, I could test it also, sure.

JackSlateur commented 5 years ago

Hi,

I believe this issue is fix in master Can you checkout the branch and confirm ?

Best regards,

ghost commented 5 years ago

I tried with new master and created configuration with error. Plugin still exists without any error. We enabled debugging. See attached file written on filedaemon in syslog. issue-2.log

JackSlateur commented 5 years ago

Hum

It does works for me:

30-janv. 18:02 bareos-dir JobId 197: No prior Full backup Job record found.
30-janv. 18:02 bareos-dir JobId 197: No prior or suitable Full backup found in catalog. Doing FULL backup.
30-janv. 18:02 bareos-dir JobId 197: Start Backup JobId 197, Job=backup-bareos-fd.2019-01-30_18.02.41_17
30-janv. 18:02 bareos-dir JobId 197: Using Device "FileStorage" to write.
30-janv. 18:02 cephoo1-fd JobId 197: Fatal error: python-fd: Traceback (most recent call last):
  File "/usr/lib/bareos/plugins/bareos-fd-libcloud.py", line 44, in load_bareos_plugin
    context, plugindef)
  File "/usr/lib/bareos/plugins/BareosFdPluginLibcloud.py", line 247, in __init__
    driver.iterate_containers()
  File "/usr/lib/python2.7/dist-packages/libcloud/storage/drivers/s3.py", line 248, in iterate_containers
    response = self.connection.request('/')
  File "/usr/lib/python2.7/dist-packages/libcloud/common/base.py", line 637, in request
    response = responseCls(**kwargs)
  File "/usr/lib/python2.7/dist-packages/libcloud/common/base.py", line 152, in __init__
    message=self.parse_error(),
  File "/usr/lib/python2.7/dist-packages/libcloud/storage/drivers/s3.py", line 96, in parse_error
    raise InvalidCredsError(self.body)
InvalidCredsError: u'<?xml version="1.0" encoding="UTF-8"?><Error><Code>SignatureDoesNotMatch</Code><RequestId>tx00000000000000000044f-005c51d8b4-7c29948-default</RequestId><HostId>7c29948-default-default</HostId></Error>'

30-janv. 18:02 cephoo1-fd JobId 197: Fatal error: Failed to authenticate Storage daemon.
30-janv. 18:02 bareos-dir JobId 197: Fatal error: Bad response to Storage command: wanted 2000 OK storage
, got 2902 Bad storage

30-janv. 18:02 bareos-dir JobId 197: Error: Bareos bareos-dir 16.2.4 (01Jul16):
  Build OS:               x86_64-pc-linux-gnu debian Debian GNU/Linux 9.3 (stretch)
  JobId:                  197
  Job:                    backup-bareos-fd.2019-01-30_18.02.41_17
  Backup Level:           Full (upgraded from Incremental)
  Client:                 "bareos-fd" 16.2.4 (01Jul16) x86_64-pc-linux-gnu,debian,Debian GNU/Linux 9.3 (stretch)
  FileSet:                "SelfTest" 2019-01-30 17:58:34
  Pool:                   "Full" (From Job FullPool override)
  Catalog:                "MyCatalog" (From Client resource)
  Storage:                "File" (From Job resource)
  Scheduled time:         30-janv.-2019 18:02:41
  Start time:             30-janv.-2019 18:02:43
  End time:               30-janv.-2019 18:02:44
  Elapsed time:           1 sec
  Priority:               10
  FD Files Written:       0
  SD Files Written:       0
  FD Bytes Written:       0 (0 B)
  SD Bytes Written:       0 (0 B)
  Rate:                   0.0 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Accurate:               no
  Volume name(s):         
  Volume Session Id:      187
  Volume Session Time:    1548592707
  Last Volume Bytes:      0 (0 B)
  Non-fatal FD errors:    2
  SD Errors:              0
  FD termination status:  Fatal Error
  SD termination status:  Waiting on FD
  Termination:            *** Backup Error ***

The debug log shows only this:

Jan 30 18:07:33 cephoo1 BareosFdPluginLibcloud: [74376] BareosFdPluginLibcloud called with plugindef: provider=S3_RGW:<rest of the config here>

Maybe the libcloud is behaving differently against Azure That seems wild, I will nevertheless get an account to test that

ghost commented 5 years ago

It works, if we use wrong secret. If we use wrong key, it dioes not work. Even we can see exception in syslog (see issue-2.log) FD goes back with no error to director.

JackSlateur commented 5 years ago

Ok, so this is driver specific (I am testing against S3_RGW, and whichever config is wrong does the job, extra config too)

Thanks for the tests you are providing, I will get an account and check that deeply

ghost commented 5 years ago

Probably we need one more check because Azure is working with hostnames for each storage account. So you'll receive 'unknown host' exception if account name is wrong: Jan 30 17:29:42 backup-host BareosFdPluginLibcloud: [3918] ConnectionError: HTTPSConnectionPool(host='ccieu1uauat1.blob.core.windows.net', port=443): Max retries exceeded with url: /?comp=list&include=metadata&maxresults=100 (Caused by NewConnectionError('<requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f933492e4d0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

And this exception is coming from _map() function. So enother one catch in _map() probably?

JackSlateur commented 5 years ago

So there is an implementation difference between s3 and azure I have pushed a commit in master, where we get information from the backend storage to a variable (even if it is not used)

Considering your last comment, I just tested with the azure driver and dummy key/secret

ghost commented 5 years ago

works for me, thanks