Tests are generating too many warnings

jbarnoud commented 4 years ago

Expected behavior

Running the tests issues no or only a few warnings that can be displayed and, ideally, addressed.

Actual behavior

Running the tests issues more than 30,000 warnings that are untracktable to look at, meaning we can very easily miss important warnings.

Code to reproduce the behavior

Just run the tests

Current version of MDAnalysis

Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)") 2.0.0-dev0
Which version of Python (python -V)? Python 3.9.0
Which operating system? Ubuntu 20.04 on WSL

orbeckst commented 4 years ago

@tylerjereddy has been stalwartly working on reducing the warnings.

Do we have a statistic of which ones are the most frequent warnings so that we can make a targeted attempt to address the worst offenders first, e.g., anything from the PDBParser regarding atom types not found? We can then either address the root problems or include warnings filters to mute any warnings during a test that we know are harmless.

richardjgowers commented 4 years ago

Would be nice if warnings meant something in tests. One idea is to change the common warnings we emit (i.e. MDAnalysis.GuessingWarning) then have a pytest fixture applied to all tests that silences these warnings. I think this might address ~80% at least

jbarnoud commented 4 years ago

I just had a very quick look and it seems that there are limitations to what types of warning you can ignore globally. From what I understand, there is a fixed list of warning types that can be ignored. For the other types, you would have to capture them on a per-test basis, which would probably be the cleanest but would not be tractable in some cases.

Another option is to ignore individual warnings by string. This would go in a config file, so it is doable.

Finally, it seems that more than half of the warnings are not about guessing, but about invalid values in numpy. By ignoring the RuntimeWarnings I fall from ~30,000 to ~13,000 warnings.

orbeckst commented 3 years ago

Is there anything here that we could let prospective GSoC students do?

Ignoring numpy RuntimeWarnings would be a start.

IAlibay commented 2 years ago

So as may have been obvious from all the recent issues, I'm currently looking at the deprecation warnings so we don't lag too behind for the next release.

I've found that I can reduce the number of emitted warnings to a more manageable 382 with the setup.cfg contents shown below.

The first ~ 3 RuntimeWarnings (double_scalars, true_divide, and sqrt) account for ~ 22k warnings being emitted. About ~ 20k warnings are related to things like no dt information, and Guessed all Masses...

So the question here is, what do we mind being globally filtered?

Personally I think the UserWarnings and DeprecationWarnings I have down below are probably fine, it's all stuff that's either intentionally triggered or without any negative side-effect.

The numpy RuntimeWarnings however seem like they could mask some bad happenings if we aren't careful with them? My personal view would be to have these be turned off via decorators at the individual test level - that way we can be sure we're only affecting the tests we know should throw those warnings?

setup.cfg contents:

[tool:pytest]
filterwarnings=
    always
    error:`np.*` is a deprecated alias for the builtin:DeprecationWarning
    # don't enforce for third party packages though:
    ignore:`np.*` is a deprecated alias for the builtin:DeprecationWarning:networkx.*:
    error:Creating an ndarray from ragged nested sequences
    error:invalid escape sequence \\:DeprecationWarning
    # Ignore frequent errors popping up in our tests
    # Elements
    ignore:Element information:UserWarning
    ignore:ATOMIC_NUMBER record not found, elements attribute will not be populated:UserWarning
    ignore:Unknown element  found for some atoms:UserWarning
    ignore:Unknown elements found for some atoms:UserWarning
    # Mass
    ignore:Failed to guess the mass:UserWarning
    ignore:Guessed all Masses to 1.0:UserWarning
    # Coordinates
    ignore:Reader has no dt information, set to 1.0 ps:UserWarning
    ignore:No coordinate reader found:UserWarning
    ignore:No dimensions set for current frame, zeroed unitcell will be written:UserWarning
    ignore:Could not find netCDF4 module:UserWarning
    ignore:No coordinates found in the RDKit molecule:UserWarning
    ignore:Empty box:UserWarning
    ignore:missing dimension - setting unit cell to zeroed box:UserWarning
    ignore:Not all velocities were present.  Unset velocities set to zero.:UserWarning
    ## PDB things
    ignore:Unit cell dimensions not found. CRYST1 record:UserWarning
    ignore:1 A\^3 CRYST1 record:UserWarning
    ignore:The bfactor topology attribute is only provided as an alias:DeprecationWarning
    # Topology
    ignore:Residues specified but no atom_resindex given:UserWarning
    ignore:Segments specified but no segment_resindex given:UserWarning
    ignore: Found no information for attr:UserWarning
    ignore:Supplied AtomGroup was missing the following attributes:UserWarning
    ignore:Found missing ChainIDs:UserWarning
    ignore:No hydrogen atom found in the topology:UserWarning
    ignore:Unknown element X found for some atoms:UserWarning
    ignore:Unknown ATOMIC_NUMBER value found for some atoms:UserWarning
    # Converters
    ignore:NaN detected in coordinates, the output molecule will not have 3D coordinates assigned:UserWarning
    ignore:Both _GasteigerCharge and _TriposPartialCharge properties are present:UserWarning
    ignore:No `bonds` attribute in this AtomGroup. Guessing bonds based on atoms coordinates:UserWarning
    # NamedStream warnings
    ignore:Constructed NamedStream:RuntimeWarning
    # Selections
    ignore:Using float equality to select atoms is not recommended
    # Aux
    ignore:AuxStep does not support time selection:UserWarning
    ignore:AuxStep does not support data selection:UserWarning
    # Analysis
    ignore:The `hbonds` attribute was deprecated in MDAnalysis:DeprecationWarning
    ignore:`HydrogenBondAutoCorrel` is deprecated:DeprecationWarning
    ignore:The `p_components` attribute was deprecated in MDAnalysis 2.0.0:DeprecationWarning
    ignore:The `rmsf` attribute was deprecated in MDAnalysis 2.0.0:DeprecationWarning
    ignore:This module was moved to MDAnalysis:DeprecationWarning
    ignore:Box padding \(currently set at 2.0\):UserWarning
    ignore:Atom selection does not fit grid:UserWarning
    ignore:Desired number of sample points too high, using 6:RuntimeWarning
    ignore:The `rmsd` attribute was deprecated in MDAnalysis 2.0.0:DeprecationWarning
    ignore:Empty string to select atoms, empty group returned:UserWarning
    # RuntimeWarnings
    ignore:invalid value encountered in double_scalars:RuntimeWarning
    ignore:invalid value encountered in true_divide:RuntimeWarning
    ignore:invalid value encountered in sqrt:RuntimeWarning
    ignore:invalid value encountered in float_scalars:RuntimeWarning

orbeckst commented 2 years ago

That's an excellent step forward in making the test output sane(er)!

Filtering pretty much everything that we generate is good. Presumably, our tests can still catch them if we want to test for a specific warning?

Pinpointing the numpy ones to where we generate them and understanding why they are generated is important to make sure that we are actually producing correct results. I like your idea of quieting them on a per-test basis together with a comment explaining why. This will produce a permanent record of our understanding.

orbeckst commented 2 years ago

(And yes, you convinced me to change my opinion https://github.com/MDAnalysis/mdanalysis/issues/2980#issuecomment-798989249 to "ignore NumPy runtime warnings" ;-) ).

jbarnoud commented 2 years ago

I like silences the errors per test.

IAlibay commented 2 years ago

I like silences the errors per test.

Both the expected MDA ones and the external ones?

jbarnoud commented 2 years ago

Ideally yes. That would be the safest. That would also take quite some time, even if I expect few tests and fixtures to generate most of the warnings. On 15 Feb 2022 18:43, Irfan Alibay @.***> wrote:

I like silences the errors per test.

Both the expected MDA ones and the external ones?

—Reply to this email directly, view it on GitHub, or unsubscribe.Triage notifications on the go with GitHub Mobile for iOS or Android. You are receiving this because you authored the thread.Message ID: @.***>

IAlibay commented 2 years ago

Ideally yes. That would be the safest. That would also take quite some time, even if I expect few tests and fixtures to generate most of the warnings.

re: "expect few tests and fixtures" -> after multiple hours spent here, that's not really the case imho. You probably are in the order of max hundreds per tests when you're looking at tens of thousands of warnings overall. I've already spent a few hours on this, it's not a short term issue.

I will say two things here -

1) We really need to come to an agreement here, I tried starting doing per-test fixtures in PRs and was told that this wasn't worth it. I'd prefer not have to go back and forth on this. @MDAnalysis/coredevs can we have everyone pitching in towards consensus?

2) This is extremely time consuming - we're talking about thousands of tests which will require an in-depth look at. There are MDA UserWarnings we throw out there that really don't matter (say element stuff, or dt being placed to 1 ps). Given how many other issues we have to deal with - what are the gains and are they sufficient enough for them to be worth our time?

(Also probably a third point here - we need to be clear on why we're doing this at the test level. If it's to check the validity of each test then we probably shouldn't make this an issue we throw at prospective GSoC contributors).

orbeckst commented 2 years ago

I was only thinking of the numpy runtime warnings

    ignore:invalid value encountered in double_scalars:RuntimeWarning
    ignore:invalid value encountered in true_divide:RuntimeWarning
    ignore:invalid value encountered in sqrt:RuntimeWarning
    ignore:invalid value encountered in float_scalars:RuntimeWarning

and see if they come from a small number of our tests because they might hint at issues in algorithms. If feasible, mute these few (maybe not few???) on a per-test basis.

How many of our tests generate these warnings above?

I would be ok with a broad-brush silencing for most everything else.

IAlibay commented 2 years ago

Those account for ~ 20k warnings split, maybe split across a hundred or so tests?

It's manageable and I'd advocate doing the non-MDA tests manually. It's going to be time consuming either way tbh.

MDAnalysis / mdanalysis