cms-sw / cmssw

CMS Offline Software
http://cms-sw.github.io/
Apache License 2.0
1.09k stars 4.33k forks source link

Reduce dependency on eigen #31735

Open Dr15Jones opened 4 years ago

Dr15Jones commented 4 years ago

FWLite only depends on eigen because of the tenous use of that package by DataFormats/HGCalReco and DataFormats/Math.

DataFormats/HGCalReco/interface/Trackster.h uses eigen only in a member function that takes eigen classes and converts them into std::array's. This dependency could be removed either by converting that function to be templated (since it is all inlined anyway) or by using a stand alone function to do the conversion. The function in question, Trackster::fillPCAVariables is only ever called from RecoHGCal/TICL/plugins/TrackstersPCA.cc so the conversion could be done there.

DataFormats/Math/interface/choleskyInversion.h is the only use of eigen in DataFormats/Math and the header is not used anywhere in CMSSW except in the unit test for the function. This file could either be removed entirely from CMSSW or moved to a non DataFormats package.

cmsbuild commented 4 years ago

A new Issue was created by @Dr15Jones Chris Jones.

@Dr15Jones, @dpiparo, @silviodonato, @smuzaffar, @makortel, @qliphy can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

Dr15Jones commented 4 years ago

assign reconstruction, upgrade

cmsbuild commented 4 years ago

New categories assigned: upgrade,reconstruction

@slava77,@perrotta,@jpata,@kpedro88 you have been requested to review this Pull request/Issue and eventually sign? Thanks

kpedro88 commented 4 years ago

@rovere can you see if it's feasible to relocate the eigen dependence for Trackster?

@VinInn @fwyzard can you look at choleskyInversion?

makortel commented 4 years ago

https://github.com/cms-sw/cmssw/pull/31722 adds a dependence DataFormats/Math/interface/choleskyInversion.h, but the use in there does not preclude moving it to a non-DataFormat package.

fwyzard commented 4 years ago

@VinInn https://github.com/VinInn @fwyzard https://github.com/fwyzard can you look at choleskyInversion?

No, but feel free to make a PR, and somebody will review it.

VinInn commented 4 years ago

Historically Math is in DataFormat. Was not my decision as was not my decision to swallow SMatrix in Root. Most probably we will migrate to eigen more and more, so just accept the fact that eigen is first class in cms

Dr15Jones commented 4 years ago

@davidlange6 FYI

rovere commented 4 years ago

Could someone explain why we need to reduce the dependency from Eigen?

slava77 commented 4 years ago

Could someone explain why we need to reduce the dependency from Eigen?

IIUC, it's for DataFormats. To invert your question, what are the reasons a subsystem dealing with persisted data should have dependency on Eigen?

rovere commented 4 years ago

No data-type eigen-specific is used to persist anything, IIRC. Eigen is introduced via member functions, as @Dr15Jones already mentioned in his first post. So, again, can someone explain why the dependency on Eigen is something we should reduce?

VinInn commented 4 years ago

Same applies to Math Geometry etc. Or even more in general to anything that can be casted to an array of float or double or to std::byte ... I think we will move to eigen as matrix class.

Dr15Jones commented 4 years ago

So, again, can someone explain why the dependency on Eigen is something we should reduce?

The request is not specific to Eigen but is true in general (you can see other recent pull requests I've done to reduce DataFormats dependencies on externals). Beyond the good practice in large scale software design to minimize dependencies in order to improve build times and eas maintenance, classes in DataFormats have some additional reasons to minimize external dependencies

  1. We are trying to make use of ROOT's new C++ modules mechanism for ROOT dictionaries. This is supposed to give us faster and less memory intensive dictionary usage. However, each external seen in a .h file requires the specification of its own module declaration which is quite involved and has to be constantly maintained as we update versions of the external.
  2. We are working with HPC specialists on improving IO. To do that, they need to be able to build our code and do not want our entire software stack. To that end, we've create a way to use CMake to make only those CMSSW packages needed to use the ROOT dictionaries for data products stored in our production files. Every additional external required for those packages makes it just that much harder to setup that build.
  3. We have an FWLite distribution which can be used via the package manager 'conda'. Each additional external dependency we have for FWLite makes it that much more difficult to maintain.
slava77 commented 4 years ago

@Dr15Jones do I understand your comment https://github.com/cms-sw/cmssw/issues/31735#issuecomment-712144543 correctly that once a dependency on X is added in one place of DataFormats, it roughly does not matter how many more places will depend on this X in DataFormats? Or is there a package by package burden of maintenance?

Dr15Jones commented 4 years ago

do I understand your comment #31735 (comment) correctly that once a dependency on X is added in one place of DataFormats, it roughly does not matter how many more places will depend on this X in DataFormats?

I think to first order that is correct. On a package basis, I think it just increases the time and memory needed to compile the code.