Performance tweaking wrt parallel runs

matthdsm commented 2 years ago

Hi,

I'm trying to run multiple instances of snap (in docker) through a workflow manager. However, I'm noticing only one of the instances is using the full CPU capacity at the time.

Are there any settings to tweak so we can fully use the cpu's of the entire node?

htop screengrab

Currently, we're using default settings with sorting enabled on 18 threads per instance. I tried using -map- but that didn't help.

Thanks for the info Matthias

matthdsm commented 2 years ago

I also noticed when monitoring the processes that each thread uses on average 30% with peaks to 50% of the allocated core capacity.

slw287r commented 2 years ago

Multiple snap-aligner instances will use up to the number of threads specified by -t. Try -t 72 for each instance to see the difference.

matthdsm commented 2 years ago

So to be clear: when I run 3 instances with each 18 cores allocated (-t 18), all three instances will be limited to the same 18 cores? I would assume they each take up the full CPU of the 18 allocated cores, so 18 cores/instance , thus filling the entire node.

bolosky commented 2 years ago

It's because SNAP is binding the threads to the low numbered cores, so all of the instances are running on cores 0-17.

Try -b-, which will turn this off and let the OS schedule threads as it sees fit.

-map- is not a good idea for this. In the default mode there will be one copy of the index in memory, with -map- there will be one copy per instance of SNAP. It's also slower to load.

From: Matthias De Smet @.> Sent: Wednesday, August 24, 2022 5:18 AM To: amplab/snap @.> Cc: Subscribed @.***> Subject: [amplab/snap] Performance tweaking wrt parallel runs (Issue #157)

Hi,

I'm trying to run multiple instances of snap (in docker) through a workflow manager. However, I'm noticing only one of the instances is using the full CPU capacity at the time.

Are there any settings to tweak so we can fully use the cpu's of the entire node?

[htop screengrab]https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuser-images.githubusercontent.com%2F11850640%2F186415785-d4d9258f-2072-459a-af08-8553e920546f.png&data=05%7C01%7Cbolosky%40microsoft.com%7C3860c682827d494c8e9808da85cab914%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637969402979086522%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TgnoaKmsdR3i%2BGghiRj3SrP%2FL4l2zJZTQb5kfkOaZv0%3D&reserved=0

Currently, we're using default settings with sorting enabled on 18 threads per instance. I tried using -map- but that didn't help.

Thanks for the info Matthias

- Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F157&data=05%7C01%7Cbolosky%40microsoft.com%7C3860c682827d494c8e9808da85cab914%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637969402979243208%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GzsPYqV%2FF8YptWr7oO55mNQ5FCo%2BHWXTShafDFIp6IQ%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAHPTWIOZWHXTBKJZD2TMG3V2YHIPANCNFSM57PDHGHA&data=05%7C01%7Cbolosky%40microsoft.com%7C3860c682827d494c8e9808da85cab914%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637969402979243208%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EChEZlnaT2FY%2BdqjJ4X6cRCjlYvPMzidsM5vg6y6%2BcM%3D&reserved=0. You are receiving this because you are subscribed to this thread.Message ID: @.**@.>>

matthdsm commented 2 years ago

Oh wow, adding -b- has a tremendous effect on efficiency! I can now see all fully cores used. Is there a reason this isn't default behaviour?

bolosky commented 2 years ago

The intent is to run one instance of snap at a time. Typically it can saturate all of the cores. When that happens, moving threads from core to core causes the cache state to have to move which can be expensive. Binding the threads to cores prevents this from happening.

I'm not sure why you're not seeing 100% use when you run just one instance at a time if your IO system is fast enough to keep up, which it must be if you can get 100% with multiple instances.

From: Matthias De Smet @.> Sent: Wednesday, August 24, 2022 11:12 PM To: amplab/snap @.> Cc: Bill Bolosky @.>; Comment @.> Subject: Re: [amplab/snap] Performance tweaking wrt parallel runs (Issue #157)

Oh wow, adding -b- has a tremendous effect on efficiency! I can now see all fully cores used. Is there a reason this isn't default behaviour?

- Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F157%23issuecomment-1226819676&data=05%7C01%7Cbolosky%40microsoft.com%7Cdcf49607ff5c492c539f08da8660c3db%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637970047425528405%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=wNvjUEK5cnP5c6jkX1zwUM42ORwgziO2rrbiOsLEiRY%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAHPTWPJT2BCW4T7F5QOGLDV24FEFANCNFSM57PDHGHA&data=05%7C01%7Cbolosky%40microsoft.com%7Cdcf49607ff5c492c539f08da8660c3db%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637970047425684665%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=29A4qcIgw0Er3LI2FP2mU7ibutza%2FMyHay8Y9IhCmQ8%3D&reserved=0. You are receiving this because you commented.Message ID: @.**@.>>

matthdsm commented 2 years ago

I'm working on a heterogenous cluster through a workflow manager, so the number is threads is set before scheduling. This means I can't use the full node, since I've no idea where it will be scheduled. I'm thinking we only see about 30% usage per thread since there were 3 instances running on the same 18 cores, which would mean all 3 instances are competing for the same CPU.

Background info (for this node):

Ubuntu 18.04.5 LTS
Intel(R) Xeon(R) CPU E5-2698
CephFS backend

bolosky commented 2 years ago

If you just don't specify -t it defaults to the number of cores on the machine (unlike seemingly every other piece of software out there for whatever reason). So if you arrange to just have one instance of snap per machine at a time then it should do the right thing.

But what you're doing seems to work fine, too.

From: Matthias De Smet @.> Sent: Wednesday, August 24, 2022 11:26 PM To: amplab/snap @.> Cc: Bill Bolosky @.>; Comment @.> Subject: Re: [amplab/snap] Performance tweaking wrt parallel runs (Issue #157)

I'm working on a heterogenous cluster through a workflow manager, so the number is threads is set before scheduling. This means I can't use the full node, since I've no idea where it will be scheduled. I'm thinking we only see about 30% usage per thread since there were 3 instances running on the same 18 cores, which would mean all 3 instances are competing for the same CPU.

Background info (for this node):

Ubuntu 18.04.5 LTS
Intel(R) Xeon(R) CPU E5-2698
CephFS backend

- Reply to this email directly, view it on GitHubhttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Famplab%2Fsnap%2Fissues%2F157%23issuecomment-1226828929&data=05%7C01%7Cbolosky%40microsoft.com%7C57930969bb9241e8cbf208da8662a09a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637970055404808978%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=rc3GmydDIjNzglIbLPQPYZvKvZWY6%2FxTRKURjvSQ9VI%3D&reserved=0, or unsubscribehttps://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAAHPTWPGYPBMRTH7VVNKECLV24GWFANCNFSM57PDHGHA&data=05%7C01%7Cbolosky%40microsoft.com%7C57930969bb9241e8cbf208da8662a09a%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637970055404965202%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=SqgtIrha8uFsH4%2FxbSJOk9k5Ff0%2F2wms2k6bSKDQ%2Bic%3D&reserved=0. You are receiving this because you commented.Message ID: @.**@.>>

matthdsm commented 2 years ago

Allright, thanks for all the help!

amplab / snap

Performance tweaking wrt parallel runs #157