hechth commented 1 year ago

Currently, the MS1 data is copied into the slot for MS2 data if it is not present in the version that reads data from a csv, while it is kept empty when reading it from xcms - should this be made the general case?

cbroeckl commented 1 year ago

if i recall, the clustering algorithm is written to expect data in the MS2 slot as well. This is sloppy coding, frankly, as the way it is written was just a shortcut to keep from having to change the similarity scoring. If there is only MS1 data, in theory there is no reason to be calculating MS2 similarity, or MS1vs MS2 correlational similarity.

To move away from this we would need to ensure that the calculate.similarity function behaviour is different when no MS2 data is available - currently there is no condition written to deal with this situation: ` max_value <- pmax( cor( data1[, start_row:stop_row], data1[, start_col:stop_col], method = cor.method, use = "everything"), cor( data1[, start_row:stop_row], data2[, start_col:stop_col], method = cor.method, use = "everything"), cor( data2[, start_row:stop_row], data2[, start_col:stop_col], method = cor.method, use = "everything")

, na.rm = TRUE

          )
        )
        # correlational similarity
        corr_sim <- round(exp(-((1 - max_value) ^ 2) / (2 * (sr ^ 2))), digits = 20)
      }`

i think it is better to remedy this situation than leave it as it was written. fewer calculations to do.

hechth commented 1 year ago

@cbroeckl I agree - then let's keep an eye on this. let's make a list of places on the code where this behaviour will need to be adapted and resolve them step by step.

cbroeckl / RAMClustR

How to handle missing `MSMS` data #38

, na.rm = TRUE