dirac-institute / sorcha

An open-source community LSST Solar System Simulator
Other
16 stars 17 forks source link

Failure in the linking filter with new chunking in PR 998 #1009

Open Gerenjie opened 1 month ago

Gerenjie commented 1 month ago

While testing the new chunking code on the slice_inputs branch, a single ~100-object file succeeded, but all ten ~10,000-object files failed.

Chunks 1-6 failed on line 217 in PPMiniDifi.py: k = i + np.argmin(mjd[i:j]) ValueError: attempt to get argmin of an empty sequence

Chunks 7-10 failed on line 120 in simulation_driver.py: for i, row in orbits_df.iterrows(): AttributeError: 'NoneType' object has no attribute 'iterrows'

Happy to provide the input files, but the fact that 10/10 failed implies that it's a common mode.

Version of relevant packages and how they were installed (pip/conda/mamba) :

Linux-4.18.0-513.18.1.el8_9.x86_64-x86_64-with-glibc2.28
Python 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0]
sorcha 0.1.dev1726+g954e785
pandas 2.2.2
assist 1.1.9
rebound 4.4.1
sbpy 0.3.0
# Copy the result here
Gerenjie commented 1 month ago

uniform_colors.csv uniform_orbits.csv Rubin_full_footprint.ini.txt

Added .txt to the end of the .ini because Github wouldn't let me upload a .ini file for some reason.

Gerenjie commented 1 month ago

sorcha run -c Rubin_full_footprint.ini -pd baseline_v3.4_10yrs.db -o ./ -t 1 -ob ./uniform_orbits.csv -p uniform_colors.csv --process-subset 1/10

sorcha run -c Rubin_full_footprint.ini -pd baseline_v3.4_10yrs.db -o ./ -t 2 -ob ./uniform_orbits.csv -p uniform_colors.csv --process-subset 2/10

sorcha run -c Rubin_full_footprint.ini -pd baseline_v3.4_10yrs.db -o ./ -t 3 -ob ./uniform_orbits.csv -p uniform_colors.csv --process-subset 3/10

sorcha run -c Rubin_full_footprint.ini -pd baseline_v3.4_10yrs.db -o ./ -t 4 -ob ./uniform_orbits.csv -p uniform_colors.csv --process-subset 4/10

sorcha run -c Rubin_full_footprint.ini -pd baseline_v3.4_10yrs.db -o ./ -t 5 -ob ./uniform_orbits.csv -p uniform_colors.csv --process-subset 5/10

sorcha run -c Rubin_full_footprint.ini -pd baseline_v3.4_10yrs.db -o ./ -t 6 -ob ./uniform_orbits.csv -p uniform_colors.csv --process-subset 6/10

sorcha run -c Rubin_full_footprint.ini -pd baseline_v3.4_10yrs.db -o ./ -t 7 -ob ./uniform_orbits.csv -p uniform_colors.csv --process-subset 7/10

sorcha run -c Rubin_full_footprint.ini -pd baseline_v3.4_10yrs.db -o ./ -t 8 -ob ./uniform_orbits.csv -p uniform_colors.csv --process-subset 8/10

sorcha run -c Rubin_full_footprint.ini -pd baseline_v3.4_10yrs.db -o ./ -t 9 -ob ./uniform_orbits.csv -p uniform_colors.csv --process-subset 9/10

sorcha run -c Rubin_full_footprint.ini -pd baseline_v3.4_10yrs.db -o ./ -t 10 -ob ./uniform_orbits.csv -p uniform_colors.csv --process-subset 10/10

Gerenjie commented 1 month ago

minidifi_failure.csv

One example of a minidifi failure. I can't recover the input orbit, color, or seed information without a bunch more work, but I don't think it's necessary for diagnosing the minidifi failure. If an example with input orbit, color, or seed is required then I can do some more work to generate a new single (orbit, color, seed) which fails.

astjoephysics commented 1 month ago

This is an instance of some edge cases wherein in linkObject within PPMiniDifi the algorithm tries to find the first instance of an observation within the 15 day discovery window - the edge case here being the indexes i,j of the observations are e.g. index 10 -> index 10, so finding the argmin of mjd[i:j] is trying to find the argmin of an empty array (no values between index 10 to 10!).

This has actually already been addressed and "fixed" in #972, but the code that was causing the issue was never deleted, so if such an instance as above happens it breaks the code. However, for all other working cases, it didn't matter as the working code overwrites all the variables used in discoverability anyways in the next block. Fix is removing the old block of code (lines 215-218 in PPMiniDifi).

mschwamb commented 4 weeks ago

@astjoephysics can you add this to the unit test? There's a merged PR that removes the offending code that should have been deleted (#1010).