Derecho-Project / derecho

The main code repository for the Derecho project.
BSD 3-Clause "New" or "Revised" License
182 stars 46 forks source link

Removing a copy on appending log to Persistent<T>. #276

Closed songweijia closed 2 months ago

songweijia commented 2 months ago

We can avoid the copy when we appending to the persistent log. The idea is to use a 'data generator' lambda to do the work so that the object can be serialized directly to the persistent data region. This pull request involves changed IDeltaSupport API:

/** Removed in this pull request.
void IDeltaSupport::finalizeCurrentDelta(const DeltaFinalizer&);
**/

/** Added in this pull request. */
size_t IDeltaSupport::currentDeltaToBytes(uint8_t* const);
size_t IDeltaSupport::currentDeltaSize();
songweijia commented 2 months ago

A new configuration header file "derecho/config.h" is used to control the features during compilation. Also, change 'include' directives to <> for all headers not from a relative path.

etremel commented 2 months ago

I saw your e-mail and looked at the new commit that adds the "config.h" header and the "ENABLE_HMEM" option. This looks like a good idea and is an important fix to allow Derecho to be tested with the Libfabrics TCP provider. However, your commit also re-introduces a bug I fixed a while ago: include paths to Derecho headers within Derecho should use quotes, not <> brackets. As I documented in commit 90b587d16e96c9e86db9221b63022aa0cc99f999, using <> brackets for internal headers causes the compiler to search the system includes before the local source directory it is compiling, which makes it hard to rebuild Derecho after making changes (because the compiler will prefer the old "installed" headers instead of the changed ones in the source tree).

Also, since the Derecho 2.4.0 release is kind of broken without the ENABLE_HMEM option, but the other changes in this pull request change the Persistent<T> IDeltaSupport API, I think we should separate the config changes into their own pull request and merge those first. That way we can create a release that is "old IDeltaSupport API but with TCP fixed", and another one that is "new IDeltaSupport API". I created a branch with just the config changes, called fix_tcp_provider, by copying all of your changes from the recent commit except the ones that changed quotes to <>. If you agree with this approach, we can create another pull request from that branch.

songweijia commented 2 months ago

Thank you for the input.

First of all, I think that merging the config.h along with the DeltaSupport is not an issue: there only four places used that API, three in Derecho test, the other is Cascade. I've tested the persistent test and Cascade implementation. You have tested it in your signature feature. I believe it's better to include them instead of delaying it.

About using <> or "" in the include directive: according to the GCC document here, using "" will check the 'current directory', which is the directory of the current input file. I also did experiment to confirm this. Just cite the text below.

include "file"

This variant is used for header files of your own program. It searches for a file named file first in the current directory, then in the same directories used for system header files. The current directory is the directory of the current input file. It is tried first because it is presumed to be the location of the files that the current input file refers to. (If the `-I-' option is used, the special treatment of the current directory is inhibited.)

If you check my fix, you will notice that I didn't change all ""s to <>s. I only changed those that did not make sense. For example in rpc_manager.hpp, I kept #include "../view.hpp" as is but changed #include "derecho/utils/logger.hpp" to #include <derecho/utils/logger.hpp>. Why? because we can find ../view.hpp from the current directory containing rpc_manager.hpp, but we cannot find derecho/utils/logger.hpp from the current directory!!! The behaviour will be falling back to "<>".

diff --git a/include/derecho/core/detail/rpc_manager.hpp b/include/derecho/core/detail/rpc_manager.hpp
index 2f338bbe..7f6cb8d6 100644
--- a/include/derecho/core/detail/rpc_manager.hpp
+++ b/include/derecho/core/detail/rpc_manager.hpp
@@ -6,11 +6,12 @@

 #pragma once

+#include <derecho/config.h>
 #include "../derecho_type_definitions.hpp"
 #include "../view.hpp"
-#include "derecho/mutils-serialization/SerializationSupport.hpp"
-#include "derecho/persistent/Persistent.hpp"
-#include "derecho/utils/logger.hpp"
+#include <derecho/mutils-serialization/SerializationSupport.hpp>
+#include <derecho/persistent/Persistent.hpp>
+#include <derecho/utils/logger.hpp>
 #include "derecho_internal.hpp"
 #include "p2p_connection_manager.hpp"
 #include "remote_invocable.hpp"

Do you remember our discussion on why the fix in commit 90b587d does not work? Currently we just remove the installed headers before building. This is same reason why I decide to drop this pull request in cascade.

Actually, I found that the misusage is in the source files too. I will add that to this pull request.

Sorry for the long response.

songweijia commented 2 months ago

By the way, how about removing the current 2.4.0 and re-release it with the included header style and config.h.in fixed?

songweijia commented 2 months ago

I also changed #include "" to #include <> in the source code.

etremel commented 2 months ago

OK, your explanation of the include directives makes sense. In fact, I don't remember a discussion about this, and I just did a git blame on the include directives to remind myself why I had changed them (leading me to commit 90b587d16e96c9e86db9221b63022aa0cc99f999). I vaguely remember doing a before-and-after test at the time I made the change, and found that the #include "derecho/core/x.hpp"-style headers made the compiler more likely to find the local-source-tree version of the headers instead of the "installed" version, but that must have been an undocumented quirk of g++ because the documentation clearly states that it shouldn't make a difference - if the included file is not in the same directory as the current file, the quote-include and bracket-include directive will search the same paths in the same order.

Regarding the releases, yes, I agree that we can just delete and re-release 2.4.0 with the config.h fix, since nobody has yet downloaded or depended on the current 2.4.0 release. That will still require a separate branch/pull request with your most recent 2 commits, though, so that we can make 2.4.1 the release in which the Persistent<T> Delta API is changed.

songweijia commented 2 months ago

Great! Actually I experimented again with the '<>' style of include. And I didn't see that problem anymore... In my experiment, gcc/g++ does behave exactly as the document said. Btw, it is important to point out that gcc/g++ preprocessor search for include directories in the following order: 1) current directory (for "" style of include) 2) path passed to -I options 3) path in CPLUS_INCLUDE_PATH/C_INCLUDE_PATH 4) The default system include paths like '/usr/include', or compiler headers

We can use gcc's -v option to check that.

I don't have a good explanation for the previous issue yet. But let's keep an eye on it and keep improving the regression test.

songweijia commented 2 months ago

Close this pull request based on our decision: split it to two pull requests: #277 and an upcoming one on the new v2.4.0