conda-forge / rdkit-feedstock

A conda-smithy repository for rdkit.
BSD 3-Clause "New" or "Revised" License
8 stars 22 forks source link

Possible problem with Atom processors and conda-forge builds #67

Open greglandrum opened 3 years ago

greglandrum commented 3 years ago

This issue filed in the RDKit repo seems to be connected with the conda-forge builds on Atom processors: https://github.com/rdkit/rdkit/issues/3814

I don't have access to an Atom processor to test on and I'm wondering if any of you has seen something like this before.

jaimergp commented 3 years ago

Hm, this opened a little can of worms, I am afraid.

The gist is that we shouldn't be optimizing with AVX unless RDKit gracefully falls back to a different implementation at runtime. If this cannot happen, we need to rebuild the packages without AVX, which might incur in a performance penalty for those users that could benefit from them.

What do you think @greglandrum? Is there an easy way RDKit could handle that or do we need to rebuild?

In the future, archspec-enabled packages will help handle different builds with and without AVX, but this is not ready yet.

chrisburr commented 3 years ago

The issue isn't with AVX, it's POPCNT (which often comes with SSE4.x). This package is explicitly enabling popcnt[1] which isn't supported by the CPU shown in the issue. The conda-forge recipe should be changed to use -DRDK_OPTIMIZE_POPCNT=OFF by default (it can be renabled once the archspec stuff is usable).

[1] https://github.com/rdkit/rdkit/blob/bc4d5478478542045a0f89e28a5da8b6d1f0e5d0/CMakeLists.txt#L68-L76 [2] https://ark.intel.com/content/www/us/en/ark/products/58917/intel-atom-processor-n2800-1m-cache-1-86-ghz.html#AdvancedTechnologies

greglandrum commented 3 years ago

Assuming that the problem @gokhantahil is actually due to the use of an AVX instruction (which we don't know for sure): I don't think there's anything we can do on the RDKit side. The example which is causing the problem is in the SMILES parser, and neither that nor anything it calls intentionally uses architecture-specific code. It could be something in boost::dynamic_bitset, but that's just a guess.

As for what should be done on the conda-forge side: I'm really not sure. I guess this isn't a problem which comes up a lot (this is the first report I can remember of this type of problem), but at the same time I think conda-forge is intended to be a generally useful distribution... so maybe it doesn't make sense to be aggressive with the compile flags and to disable RDK_OPTIMIZE_POPCNT for all platforms?

mcs07 commented 3 years ago

It seems a shame to cause a performance hit to the vast majority of users, given how widespread popcnt support is? I fall on the side of the conda-forge package catering to the most widespread use case, rather than the lowest common denominator. Especially seeing as we've lasted this long without issues and it seems archspec is coming soon to solve the problem?

jaimergp commented 3 years ago

What about providing a popcnt-less version for now that is deprioritized with track_features? The default would still be the one with popcnt, but affected users could do conda install rdkit=*=*nopopcnt?

jaimergp commented 3 years ago

We need the variant with and without popcnt anyway. When archspec comes the variant selection would be automatic, but that's it.

greglandrum commented 3 years ago

@jaimergp : do you have any sense as to whether or not this is likely to be resolvable?

jaimergp commented 3 years ago

The infrastructure for archspec variants is still being worked on. We could fix our issue with popcnt but I am unsure if that's worth it given the reduced impact. How is the PyPI release being compiled? Does it include pocpnt?

Basically, this requires a couple hours of work and I am currently super busy, but if this is one of those issues that are important but not urgent, I will tackle it eventually!

greglandrum commented 3 years ago

I don't think it's particularly urgent. I asked because it came up again this week in https://github.com/rdkit/rdkit/issues/4219 with someone using a really old Intel processor. I'm not sure supporting 14 year old CPUs is that critical, but on the other side, it might be interesting to see if enabling modern CPU features could improve performance.

jaimergp commented 3 years ago

In that case I'd suggest we wait for an official archspec based solution instead of handcrafting something here on the conda-forge channel. The rdkit channel could benefit from more esoteric variants if needed in the meantime; I don't have bandwidth for that but I can provide pointers on how to start!