LSSTDESC / SRV-planning

Repository to plan and coordinate some of the Science Release and Validation Working Group tasks
3 stars 0 forks source link

Build a core set of tests to be run recursively #5

Open nsevilla opened 2 years ago

nsevilla commented 2 years ago

We have compiled a master test list which tries to gather all past resources that we can call upon to incorporate into our test platform.

What subset of these tests should select to be run recursively on upcoming data sets? What is the very basic information we need to gather from these characterization tests?

nsevilla commented 2 years ago

I think a core inspection test should be similar to Stéphane’s tests (https://lsst.lal.in2p3.fr/lalwiki/LSS/DpddObjectCatalogRun2):

fjaviersanchez commented 2 years ago

A couple of tests that we ran pretty routinely and don't need truth information that can be added, although I suspect that most of these will be produced by faro anyway:

nsevilla commented 2 years ago

Existence of out of bounds, NaNs can be checked as well.

nsevilla commented 2 years ago

We considered running the above tests within the bounds of the existing DESCQA readiness test. However @yymao points out that these tests are written with genericity in mind, avoiding memory issues by processing a column at a time. A specific test intended to load a few columns simultaneously, maybe checking for memory constraints or, alternatively, running a bunch of readiness tests in parallel, is a more reasonable option. Alternatively, a new test reading data once and producing histograms in parallel is an option.

nsevilla commented 2 years ago

@evevkovacs also suggests using iterator objects in get_quantity to speed up loops. See this example.

nsevilla commented 2 years ago

Running several instances of the existing readiness_test in DESCQA has a few inefficiencies (simple histograms over the whole DC2 take 10+ minutes at NERSC interactive nodes) and rigidity to the things you can plot. So I decided to move on and follow Yao's suggestion building a new, more flexible and possibly more time efficient DESCQA test. Memory limits could be an issue, TBC.

yymao commented 2 years ago

Just want to note that "more flexible" and "more time efficient" are usually difficult to achieve at the same time. The inefficiencies in the current test mostly come from trying to be agnostic (i.e, flexible) about the data format/structure that the test will run on. We can easily write a much more efficient test with specific assumptions of the input data format/structure.