akiradeveloper / dm-writeboost

Log-structured Caching for Linux
GNU General Public License v2.0
120 stars 18 forks source link

Compile error on CentOS(RHEL) 7.3's kernel #165

Closed bash99 closed 7 years ago

bash99 commented 7 years ago

New kernel version, 3.10.0-514.2.2.el7, the old kernel 3.10.0-327.3 is OK.

some output from make.log

DKMS make.log for dm-writeboost-2.2.6 for kernel 3.10.0-514.2.2.el7.x86_64 (x86_64)
2016年 12月 31日  13:53:38 CST
make[1]: Entering directory `/var/lib/dkms/dm-writeboost/2.2.6/build'
make -C /lib/modules/3.10.0-514.2.2.el7.x86_64/build M=/var/lib/dkms/dm-writeboost/2.2.6/build modules
make[2]: Entering directory `/usr/src/kernels/3.10.0-514.2.2.el7.x86_64'
  CC [M]  /var/lib/dkms/dm-writeboost/2.2.6/build/dm-writeboost-target.o
  CC [M]  /var/lib/dkms/dm-writeboost/2.2.6/build/dm-writeboost-metadata.o
  CC [M]  /var/lib/dkms/dm-writeboost/2.2.6/build/dm-writeboost-daemon.o
/var/lib/dkms/dm-writeboost/2.2.6/build/dm-writeboost-target.c: In function 'reserve_read_cache_cell':
/var/lib/dkms/dm-writeboost/2.2.6/build/dm-writeboost-target.c:689:86: error: 'struct dm_target' has no member named 'per_bio_data_size'
 #define per_bio_data(wb, bio) ((struct per_bio_data *)dm_per_bio_data((bio), (wb)->ti->PER_BIO_DATA_SIZE))
                                                                                      ^
/var/lib/dkms/dm-writeboost/2.2.6/build/dm-writeboost-target.c:805:8: note: in expansion of macro 'per_bio_data'
  pbd = per_bio_data(wb, bio);
        ^
/var/lib/dkms/dm-writeboost/2.2.6/build/dm-writeboost-target.c: In function 'read_cache_cell_copy_data':
/var/lib/dkms/dm-writeboost/2.2.6/build/dm-writeboost-target.c:689:86: error: 'struct dm_target' has no member named 'per_bio_data_size'
 #define per_bio_data(wb, bio) ((struct per_bio_data *)dm_per_bio_data((bio), (wb)->ti->PER_BIO_DATA_SIZE))
                                                                                      ^
/var/lib/dkms/dm-writeboost/2.2.6/build/dm-writeboost-target.c:825:29: note: in expansion of macro 'per_bio_data'
  struct per_bio_data *pbd = per_bio_data(wb, bio);

As I know, RHEL 7.3 backport some features from new kernel and change some api to kernel 4.4 version.

akiradeveloper commented 7 years ago

@bash99

Thanks for reporting. I am not sure what is happening here but guess the backported the renaming which we dealt with in for-4.6 release.

https://github.com/akiradeveloper/dm-writeboost/commit/edbbe8837c18fbfd4445bc7a91eed56749368a59

What do you think?

akiradeveloper commented 7 years ago

Maybe doing things like this would solve the issue but I want first to make clear what is backported.

http://dpdk.org/ml/archives/dev/2016-September/046910.html

 #if (( LINUX_VERSION_CODE >= KERNEL_VERSION(3,19,0) ) \
     || ( RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7,2) ))
 #define HAVE_NDO_DFLT_BRIDGE_ADD_MASK
-#if (!( RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7,2) ))
+#if ( RHEL_RELEASE_CODE != RHEL_RELEASE_VERSION(7,2) )
 #define HAVE_NDO_FDB_ADD_VID
 #endif /* !RHEL 7.2 */
 #endif /* >= 3.19.0 */

The kernel is basically 3.10 but backports some fragments from the newer kernel is crazy idea.

bash99 commented 7 years ago

@akiradeveloper Yes, it's 4.6 related. Just compile with a quick and dry patch, text followed.

diff --git a/src/dm-writeboost-target.c b/src/dm-writeboost-target.c
index 319a29f..7c008de 100644
--- a/src/dm-writeboost-target.c
+++ b/src/dm-writeboost-target.c
@@ -674,7 +674,7 @@ enum PBD_FLAG {
        PBD_READ_SEG = 2,
 };

-#if LINUX_VERSION_CODE >= KERNEL_VERSION(4,6,0)
+#if (LINUX_VERSION_CODE >= KERNEL_VERSION(4,6,0) || RHEL_RELEASE_CODE >= RHEL_RELEASE_VERSION(7,3))
 #define PER_BIO_DATA_SIZE per_io_data_size
 #else
 #define PER_BIO_DATA_SIZE per_bio_data_size
diff --git a/src/dm-writeboost.h b/src/dm-writeboost.h
index d204406..5808cca 100644
--- a/src/dm-writeboost.h
+++ b/src/dm-writeboost.h
@@ -37,6 +37,13 @@
 #include <linux/dm-io.h>
 #include <linux/dm-kcopyd.h>

+/* we use RHEL_RELEASE_VERSION to compile with RHEL/CentOS 7.3's kernel  */
+#ifndef RHEL_RELEASE_CODE
+#define RHEL_RELEASE_CODE 0
+#define RHEL_RELEASE_VERSION(a,b) (((a) << 8) + (b))
+#endif
+
+
 /*----------------------------------------------------------------------------*/

 #define SUB_ID(x, y) ((x) > (y) ? (x) - (y) : 0)

I'll running some test right now.

BTW, I think there still are some bugs in "Read caching" or other place. Recently we put some physic server into production, in the stress test: sysbench with 64 table, 10M rows each table(150G data, ssd is two 200G 3710), run oltp for 24 hours. 26c12t CPU + 64G memory + 600g 6 raid10 HDD with centos 7.2 We got a xfs corruption and another a mysql-server lockup (after reformat and recreate cache) in a long time run (12 hours or more), without dwb or use enhanceio, test passed. The time is limited so we don't use dwb in production although dwb has better performance in previous FIO test. But I can not reproduced this behavior on a testbox (i5-4460 +1 ssd + 1 HDD), So I'm try to reproduct it in a aws box and got this compile error.

akiradeveloper commented 7 years ago

@bash99

BTW, I think there still are some bugs in "Read caching" or other place. We got a xfs corruption and another a mysql-server lockup (after reformat and recreate cache) in a long time run (12 hours or more)

Not sure precisely what was done. Could you please describe about it more? (You can make another issue) If you have the dmesg at the time it would help.

akiradeveloper commented 7 years ago

@bash99 I appreciate your patch that looks good. I want you to make a PR but one question

+#define RHEL_RELEASE_VERSION(a,b) (((a) << 8) + (b))

where do you draw this macro from? IOW, how can I believe this is correct?

bash99 commented 7 years ago

@akiradeveloper It's got from kernel-headers rpm from CentOS 7.3

[root@xxx ~]# rpm -qf /usr/include/linux/version.h 
kernel-headers-3.10.0-514.2.2.el7.x86_64
[root@xxx ~]# cat /usr/include/linux/version.h
#define LINUX_VERSION_CODE 199168
#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))
#define RHEL_MAJOR 7
#define RHEL_MINOR 3
#define RHEL_RELEASE_VERSION(a,b) (((a) << 8) + (b))
#define RHEL_RELEASE_CODE 1795
#define RHEL_RELEASE "514.2.2"
bash99 commented 7 years ago

@akiradeveloper About that bug, dmesg show nothing when mysql lockup, so I'm not so sure about it's a dwb issue and try to reproduce it on aws.

akiradeveloper commented 7 years ago

@bash99 You should have observed the dmesg when "We got a xfs corruption" happened

bash99 commented 7 years ago

It's seem DWB with temporary patch runs fine on AWS ec2. Run sysbench with 72 hours and nothing happens.

And also works on a two socket bare-metal server (12c24t 64G).

@akiradeveloper So the previous issues(xfs related) maybe a hardware-related problem. I'll try more on product-server.

akiradeveloper commented 7 years ago

@bash99 Thank you for your efforts. If it's concurrency issue, personally thinking, your system should have exposed it because 24t is enough rich and 72h is enough long to do it.

I will wait for the report of running on the product-server.

akiradeveloper commented 7 years ago

@kazuhisya Could you please test compile with fix-165 branch? I couldn't successfully apply your patch in dm-writeboost-rpm so ported by hand

kazuhisya commented 7 years ago

Hi, @akiradeveloper Thanks for the notification. I tried fix-165 branch, which seems to be good.

dkms: RPM / SRPM kmod: RPM / SRPM

When this change is merged into your master, I will also update rpm. Thanks a lot!

akiradeveloper commented 7 years ago

@kazuhisya Thanks