basho / riak

Riak is a decentralized datastore from Basho Technologies.
http://docs.basho.com
Apache License 2.0
3.94k stars 536 forks source link

2.2.5 - [enoent] - riak_repl couldn't create log dir "data/riak_repl/logs" #940

Closed bryanhuntesl closed 3 years ago

bryanhuntesl commented 6 years ago

From a bug reported on Slack by @dams while testing riak-2.2.5.rc2.

The repl data directory is specified in the advanced config file, there is no sane default it seems. it's not a big deal, but you'd expect it to start fine, even without an advanced config file.

Upon fresh install of Riak 2.2.5 - Riak throws the following error :

"riak_repl couldn't create log dir \"data/riak_repl/logs\": enoent\n"

This path needs to be set in the advanced.config file.

Two possible solutions :

bryanhuntesl commented 6 years ago

I don’t think it should be a blocker for 2.2.5 release, just add it to the release notes as a known ‘feature’ and promise to improve user experience in a later release.

dams commented 6 years ago

maybe it ought to be sorted by packaging, but riak didn't want to start until I had this bits in advanced.conf:

 {riak_repl,
  [
   {data_root, "/var/lib/riak/riak_repl/"}
  ]
  }

For easier user experience, I think riak_repl data_root should default to riak base_dir / riak_repl or something similar.

bryanhuntesl commented 6 years ago

That's in the riak_repl cuttlefish schema :

%% @doc Path (relative or absolute) to the working directory for the
%% replication process
{mapping, "mdc.data_root", "riak_repl.data_root", [
    {default, "{{repl_data_root}}"}
]}.

It relies upon the variable repl_data_root being set.

I wonder if that variable is not being set, perhaps it got lost in the move from riak_ee repository?

bryanhuntesl commented 6 years ago

So I wonder if it is not being set by default, or that it is being set to an empty value.

bryanhuntesl commented 6 years ago

@dams - is this issue when running riak 2.2.5 RC2 from packages or via make rel or make devrel ?

dams commented 6 years ago

it was when upgrading via packages (rpm in this case). The effect was that an old advanced.conf file was kept, in which there were no riak_repl section at all.

To reproduce the bug, make sure you have an advanced.conf file that is not empty, but that doesn't contain a riak_repl section. at startup, the riak_repl data_root directory will be data/riak_repl/ which is not what you'd expect. It's not a sane default because it's relative to where riak is started, not under riak_root.

bryanhuntesl commented 6 years ago

Thanks @dams - so to summarize, if a user is upgrading to Riak 2.2.5 and enabling riak_repl for the first time - they will need to create the file /etc/riak/advanced.config - the recommended contents of that file will be :

[
 {riak_core,
  [
   {cluster_mgr, {"0.0.0.0", 9080 } }
  ]},
 {riak_repl,
  [
   {data_root, "/var/lib/riak/riak_repl/"},
   {max_fssource_cluster, 5},
   {max_fssource_node, 1},
   {max_fssink_node, 1},
   {fullsync_on_connect, true},
   {fullsync_interval, 30},
   {rtq_max_bytes, 104857600},
   {proxy_get, disabled},
   {rt_heartbeat_interval, 15},
   {rt_heartbeat_timeout, 15},
   {fullsync_use_background_manager, true}
  ]},

  {lager,
   [
      {extra_sinks,
           [
            {object_lager_event,
             [{handlers,
               [{lager_file_backend,
                 [{file, "/var/log/riak/object.log"},
                  {level, info},
                  {formatter_config, [date, " ", time," [",severity,"] ",message, "\n"]}
                 ]
                }]
              },
              {async_threshold, 500},
              {async_threshold_window, 50}]
            }
            ]
      }
    ]
}
].
dams commented 6 years ago

basically, yes. Few things:

hope that helps

bryanhuntesl commented 6 years ago

More details - on a fresh install, if you subsequently delete the file /etc/riak/advanced.config - or if you upgrade and fail to provide an appropriate /etc/riak/advanced.config file, then :

2018-04-24 14:19:27 =CRASH REPORT====
  crasher:
    initial call: application_master:init/4
    pid: <0.633.0>
    registered_name: []
    exception exit: {{bad_return,{{riak_repl_app,start,[normal,[]]},{'EXIT',{{badmatch,{error,enoent}},[{riak_repl_app,start,2,[{file,"src/riak_repl_app.erl"},{line,37}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,269}]}]}}}},[{application_master,init,4,[{file,"application_master.erl"},{line,133}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,239}]}]}
    ancestors: [<0.632.0>]
    messages: [{'EXIT',<0.634.0>,normal}]
    links: [<0.632.0>,<0.7.0>]
    dictionary: []
    trap_exit: true
    status: running
    heap_size: 987
    stack_size: 27
    reductions: 181
  neighbours:
bryanhuntesl commented 6 years ago

So in conversation with the team - we've decided this is not going to be a blocker for the 2.2.5 release, will include it as a 'feature' in the release notes.

Only affects users who upgrade from a version which lacks riak_repl.

Also from user feedback, only occurs when performing upgrades with using RPM packages :

because the deb package installs the new config, and gets rid of the old one. RPM does it the other way round

Solution is to integrate and test the existing riak_repl cuttlefish configuration.