RECETOX / galaxytools

Set of Galaxy tool wrappers developed at RECETOX
MIT License
13 stars 13 forks source link

Implement tool to merge metadata excel sheet into MSP by joining on a specified column/metadata key #422

Closed hechth closed 10 months ago

hechth commented 11 months ago

The tool should take an msp file and a tabular file and the metadata in the tabular file should be joined with the metadata in the MSP file on a user specified column.

The output should be the MSP file with the attached metadata.

example code

import pandas
import matchms
import numpy as np
matchms.set_matchms_logger_level('ERROR')

msp_file= "example.msp"
metadata_table_file = "metadata.csv"

spectra= list(matchms.importing.load_from_msp(msp_file))

metadata_table= pandas.read_csv(metadata_table_file )

# postprocessing
metadata_table.set_index('compound_name', inplace=True)
metadata_table.drop_duplicates(inplace=True)

spectra_metadata= pandas.DataFrame.from_dict([x.metadata for x in spectra])
spectra_metadata.reset_index(inplace=True)
spectra_metadata.set_index('compound_name', inplace=True)

merged = metadata_table.join(spectra_metadata, how='inner')

spectra_arr = np.asarray(spectra, dtype=object)

def update_metadata(spectrum:matchms.Spectrum , row):
    spectrum.metadata.update(row.to_dict())
    return spectrum

vec_update_metadata = np.vectorize(update_metadata)
merged_array= vec_update_metadata (spectra_arr, merged)

matchms.exporting.save_as_msp(merged_array.tolist(), "merged.msp")

matchms_metadata_merge_testdata.zip

EDIT 16.11.2024 This is basically solved and only waits for the matchms next release. This then means the package goes to bioconda and we use the newer version of the package in this PR.

hechth commented 11 months ago

xref https://github.com/matchms/matchms/pull/547

hechth commented 10 months ago

this is waiting for a new matchms release which will include the functionality