Jhsmit / HDX-MS-datasets

Repository for HDX-MS datasets
2 stars 0 forks source link

Porting to R #1

Open ococrook opened 2 years ago

ococrook commented 2 years ago

Hi @Jhsmit

Are you happy for me to port the datasets to an equivalent R repository as they appear? We will store them as as special mass spectrometry object there.

I assume you're not planning to store the isotope distribution for the processed data at the moment and just the deuterium incoperations?

Olly

Jhsmit commented 2 years ago

Hi @ococrook

Yes sure go ahead. But keep in mind that what I'm doing here will change in the future as we move to a community standard. So things will change. Also this repository is still highly experimental, I havent really thought in much detail yet about the structure. So suggestions / contributions are welcome :)

Indeed for the moment I'm only storing D-uptake information.

Is D-uptake information enough I think when we incorporate isotope distributions we should move to a different file format. My personal preference for this is probably HDF5 since it is hierarchical, binary, and widely adopted.

Have you been in contact with Miklos Guttman / would you like to be involved in future (technical) discussions on the format?

Jochem

ococrook commented 2 years ago

D-uptake is useful for now an can be stored easily and be used for testing methods, so certaintly useful

For isotope distribution, within R with have a Spectra object for storing/processing such data. You can have a HDF5 file backend if you want: https://github.com/rformassspectrometry/Spectra but supports several possibilities.

There are other backends that already exists and I would favour using those, to integrate data with other MS data, such as .mzML or .mzMLb.

Happy to be involved if you think its useful

Jhsmit commented 2 years ago

OK, I'll add some additional datasets that we have collected/published as well if that helps

I agree with integrating with things that are already out there, I'm not too well initiated in the the MS space but I wasn't sure about the XML based format since its text based and can have poor performance. The .mzMLb format looks interesting, didnt know about that one, and its also HDF5 based :)