CDCgov / MicrobeTrace

The Visualization Multitool for Molecular Epidemiology and Bioinformatics
https://microbetrace.cdc.gov/
Apache License 2.0
88 stars 38 forks source link

PROD: Nearest neighbor pruning links incorrectly #826

Open ikb6 opened 2 months ago

ikb6 commented 2 months ago

Was working with a dataset, and noticed that fewer links were pruned out by PROD. On comparing Angular and PROD, I realized that PROD is not correctly pruning links. It retains neighbors which are not necessarily the shortest distance. ie it keeps nodes are not really nearest neighbors. With another dataset, the number of pruned links are slightly different between PROD and Angular, but the starting number of links is too high (153) for me to track which ones were missed by PROD

Files used are available at the location below: https://cdc.sharepoint.com/:f:/r/teams/nchhstp-dhap-lb-microbetrace/Shared%20Documents/General/Files%20for%20bug%20reports/Nearest%20neighbor%20bug?csf=1&web=1&e=HmMuUk

  1. ANGULAR - without pruning

    Angular_NoPruning.JPG
  2. NN ANGULAR

    NN_Angular.JPG
  3. NN PROD

    NN_MTPROD.JPG
dacowan404 commented 1 month ago

Updated Angular version to run Nearest Neighbor algorithm the same as prod version. Added a field for user's to select epsilon value. Fixed Nearest Neighbor column on table view. In order to get Table view link count to match number of links on 2D Network view's statistic table, need to filter where Nearest Neighbor is true and distance <= 0.015 (or whatever threshold is set to). This update is currently deployed on Mossy site.