CCMS-UCSD / GNPS_Workflows

Public Workflows at GNPS
https://gnps.ucsd.edu/
Other
51 stars 43 forks source link

Filter top K edges also based on the count of shared peaks (and not only cosine) #849

Open Adafede opened 1 year ago

Adafede commented 1 year ago

Is your feature request related to a problem? Please describe. My feature request is not related to any problem, just a suggestion

Describe the solution you'd like I would like to be able to:

  1. Sort the edges by the descending order of matched peaks just as for cosine scores
  2. Eventually, filter the top K based on this order, or a mix of this order and the order given by the cosine scores

Given this original table:

source target cosine shared peaks
1 2 0.90 6
1 3 0.89 97
1 4 0.88 8
1 5 0.87 9
1 6 0.86 10
1 7 0.85 7
1 8 0.84 110
1 9 0.83 100

The currently resulting filtered table is (for a top K = 5, min cosine = 0.6, min shared peaks = 0.6):

source target cosine shared peaks
1 2 0.90 6
1 3 0.89 97
1 4 0.88 8
1 5 0.87 10
1 6 0.86 9

What I would like to have is (weights can discussed)

source target cosine shared peaks rank cosine rank shared peaks final rank (here 50:50)
1 2 0.90 6 1 8 3 (sum = 9)
1 3 0.89 97 2 3 1 (sum = 5)
1 4 0.88 8 3 6 3 (sum = 9)
1 5 0.87 10 4 4 2 (sum = 8)
1 6 0.86 9 5 5 4 (sum = 10)
1 7 0.85 7 6 7 5 (sum = 13)
1 8 0.84 110 7 1 2 (sum = 8)
1 9 0.83 100 8 2 4 (sum = 10)

resulting finally in the following filtered table:

source target cosine shared peaks rank cosine rank shared peaks final rank (here 50:50)
1 3 0.89 97 2 3 1 (sum = 5)
1 5 0.87 10 4 4 2 (sum = 8)
1 8 0.84 110 7 1 2 (sum = 8)
1 2 0.90 6 1 8 3 (sum = 9)
1 4 0.88 8 3 6 3 (sum = 9)

Hope this makes sense, happy to elaborate if needed! 😊

Additional context Code that would need to be modified is https://github.com/mwang87/GNPS_sharedcode/blob/8283c5ce154eda266b4e5fce8747845fa0314d08/molecular_network_filtering_library.py#L383

mwang87 commented 1 year ago

I think the goal here is to create some normalized score that has some idea of equal reliability. This has a lot of parallels to this:

https://pubs.acs.org/doi/10.1021/pr400230p

We've done some work to find out equivalences in the small molecule space a few years ago, maybe this is a good collaboration we can publish together on.