Closed bash99 closed 7 years ago
@bash99
Thanks for reporting. I am not sure what is happening here but guess the backported the renaming which we dealt with in for-4.6 release.
https://github.com/akiradeveloper/dm-writeboost/commit/edbbe8837c18fbfd4445bc7a91eed56749368a59
What do you think?
Maybe doing things like this would solve the issue but I want first to make clear what is backported.
http://dpdk.org/ml/archives/dev/2016-September/046910.html
#if (( LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0) ) \
|| ( RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7,2) ))
#define HAVE_NDO_DFLT_BRIDGE_ADD_MASK
-#if (!( RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7,2) ))
+#if ( RHEL_RELEASE_CODE != RHEL_RELEASE_VERSION(7,2) )
#define HAVE_NDO_FDB_ADD_VID
#endif /* !RHEL 7.2 */
#endif /* >= 3.19.0 */
The kernel is basically 3.10 but backports some fragments from the newer kernel is crazy idea.
@akiradeveloper Yes, it's 4.6 related. Just compile with a quick and dry patch, text followed.
diff --git a/src/dm-writeboost-target.c b/src/dm-writeboost-target.c
index 319a29f..7c008de 100644
--- a/src/dm-writeboost-target.c
+++ b/src/dm-writeboost-target.c
@@ -674,7 +674,7 @@ enum PBD_FLAG {
PBD_READ_SEG = 2,
};
-#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,6,0)
+#if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,6,0) || RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7,3))
#define PER_BIO_DATA_SIZE per_io_data_size
#else
#define PER_BIO_DATA_SIZE per_bio_data_size
diff --git a/src/dm-writeboost.h b/src/dm-writeboost.h
index d204406..5808cca 100644
--- a/src/dm-writeboost.h
+++ b/src/dm-writeboost.h
@@ -37,6 +37,13 @@
#include <linux/dm-io.h>
#include <linux/dm-kcopyd.h>
+/* we use RHEL_RELEASE_VERSION to compile with RHEL/CentOS 7.3's kernel */
+#ifndef RHEL_RELEASE_CODE
+#define RHEL_RELEASE_CODE 0
+#define RHEL_RELEASE_VERSION(a,b) (((a) << 8) + (b))
+#endif
+
+
/*----------------------------------------------------------------------------*/
#define SUB_ID(x, y) ((x) > (y) ? (x) - (y) : 0)
I'll running some test right now.
BTW, I think there still are some bugs in "Read caching" or other place. Recently we put some physic server into production, in the stress test: sysbench with 64 table, 10M rows each table(150G data, ssd is two 200G 3710), run oltp for 24 hours. 26c12t CPU + 64G memory + 600g 6 raid10 HDD with centos 7.2 We got a xfs corruption and another a mysql-server lockup (after reformat and recreate cache) in a long time run (12 hours or more), without dwb or use enhanceio, test passed. The time is limited so we don't use dwb in production although dwb has better performance in previous FIO test. But I can not reproduced this behavior on a testbox (i5-4460 +1 ssd + 1 HDD), So I'm try to reproduct it in a aws box and got this compile error.
@bash99
BTW, I think there still are some bugs in "Read caching" or other place. We got a xfs corruption and another a mysql-server lockup (after reformat and recreate cache) in a long time run (12 hours or more)
Not sure precisely what was done. Could you please describe about it more? (You can make another issue) If you have the dmesg at the time it would help.
@bash99 I appreciate your patch that looks good. I want you to make a PR but one question
+#define RHEL_RELEASE_VERSION(a,b) (((a) << 8) + (b))
where do you draw this macro from? IOW, how can I believe this is correct?
@akiradeveloper It's got from kernel-headers rpm from CentOS 7.3
[root@xxx ~]# rpm -qf /usr/include/linux/version.h
kernel-headers-3.10.0-514.2.2.el7.x86_64
[root@xxx ~]# cat /usr/include/linux/version.h
#define LINUX_VERSION_CODE 199168
#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))
#define RHEL_MAJOR 7
#define RHEL_MINOR 3
#define RHEL_RELEASE_VERSION(a,b) (((a) << 8) + (b))
#define RHEL_RELEASE_CODE 1795
#define RHEL_RELEASE "514.2.2"
@akiradeveloper About that bug, dmesg show nothing when mysql lockup, so I'm not so sure about it's a dwb issue and try to reproduce it on aws.
@bash99 You should have observed the dmesg when "We got a xfs corruption" happened
It's seem DWB with temporary patch runs fine on AWS ec2. Run sysbench with 72 hours and nothing happens.
And also works on a two socket bare-metal server (12c24t 64G).
@akiradeveloper So the previous issues(xfs related) maybe a hardware-related problem. I'll try more on product-server.
@bash99 Thank you for your efforts. If it's concurrency issue, personally thinking, your system should have exposed it because 24t is enough rich and 72h is enough long to do it.
I will wait for the report of running on the product-server.
@kazuhisya Could you please test compile with fix-165 branch? I couldn't successfully apply your patch in dm-writeboost-rpm so ported by hand
Hi, @akiradeveloper Thanks for the notification. I tried fix-165 branch, which seems to be good.
dkms: RPM / SRPM kmod: RPM / SRPM
When this change is merged into your master, I will also update rpm. Thanks a lot!
@kazuhisya Thanks
New kernel version, 3.10.0-514.2.2.el7, the old kernel 3.10.0-327.3 is OK.
some output from make.log
As I know, RHEL 7.3 backport some features from new kernel and change some api to kernel 4.4 version.