dmitryduev / DeepStreaks

Identifying Near-Earth Asteroids (NEAs) in the Zwicky Transient Facility (ZTF) data with deep learning
MIT License
11 stars 2 forks source link

Thoughts on DeepStreaks v2 #14

Open dmitryduev opened 3 years ago

dmitryduev commented 3 years ago

DeepStreaks has been a very successful system, and was a big leap forward back in 2019, however now I consider it largely obsolete and inadequate in terms of code/infrastructure quality, the DL models themselves, and the overall setup.

Below are my thoughts on a possible DeepStreaks v2.

I see two alternatives:

  1. A Tails-like system.
    • It would utilize a similar architecture and operate on (tessellated) full-frame image triplets (SCI, REF, DIFF).
    • It will detect real streaks and output PSF-fit-like parameters directly.
    • This would require some major effort and would be quite computationally expensive (as is Tails). Data would need to be collected and labeled similar to the way it was done for Tails.
  2. If the current streak detection algorithm is kept intact (or replaced with a superior one, but kept external), then I suggest to treat the detections as a "streak alert stream", similar to how transients are handled.
    • Each "packet" would contain the fitted streaked PSF params, metadata, and the cutouts (SCI, REF, and DIFF, and not just the DIFF as is the case ATM).
    • Similar to how the drb scoring is implemented for transients, it would make sense to run DeepStreaks on the IPAC side and add the classifications to the packets. The stream would be then consumed, filtered using the classifications (i.e. ditching >99.5% of the stream, although we could then also plug the stream into Kowalski and save it all, similar to the regular alert stream), and human-vetted on Fritz, with the potential to leverage the unparalleled capabilities (for, e.g. follow-up) that it has to offer. For a reference end-to-end implementation, see this PR on Kowalski.
    • The models themselves will most likely be simplified by a lot given that a lot more information will be available to the classifiers (see e.g. https://github.com/ZwickyTransientFacility/scope/pull/6), so the overall system will be significantly less computationally expensive than the existing solution.
    • Overall, it seems like less effort will be required for this option with the caveat that the system performance will be limited by the streak detection algorithm performance.

In both cases, an adequate infrastructure will be desirable: MLOps, CI/CD with GH Actions, code review and all things DevOps etc.

AshishMahabal commented 3 years ago

These thoughts are fine. The biggest question though is of time/effort. How inefficient is the current system? The suggested system will need effort on multiple fronts. How about discussing the possible timeline at the next ML meeting?