Format of the output - Githubissues

bryanhanson / readJDX

Import spectroscopic data in the JCAMP-DX format

https://bryanhanson.github.io/readJDX/

8 stars 2 forks source link

Format of the output #4

Closed rguliev closed 7 years ago

rguliev commented 7 years ago

Hi, Is it possible to modify the format of output? Usually we use R to analyse many spectra. But using the current form, it's not that easy to combine many spectra:

list element with spectral data will have different names for different files,
metadata is difficult to combine because it's just an array of strings.

I can suggest 2 options: Variant 1: List of 2:

metadata - either a data.frame with colums 'title', 'JCAMP-DX', 'data type', etc. OR a list but data.frame seems easier to merge.
spc (or xy, or xydata, ... call it whatever you like, just make it same for all files) - either a matrix or data.frame with colnames x,y.

Variant 2: one data.frame with columns

'spc' containing spectra data
'title', 'JCAMP-DX', 'data type', etc. containing metadata.

bryanhanson commented 7 years ago

I'll think about the metadata part, but the problem is that some metadata, esp. for NMR data, is hundreds of lines long because of the manufacturer-supplied parameters. That would make a very awkward data frame. As it is now, if you want something in particular you can extract it with grep from the first list element.

To access the spectral data w/o knowing the names, you can just use numerical indexing:

tst <- readJDX("testfile.jdx")
spec <- tst[[2]]
str(spec) # a data frame, ready to be processed further.

Or if you want the names, you can iterate over them:

tst <- readJDX("testfile.jdx")
nms <- names(tst)
for (i in 2:length(nms)) do something with tst[[nms[i]]]
# note nms[[1]] is the metadata so skip it

Will either of these approaches work for your use?

rguliev commented 7 years ago

Hm.. Sorry, haven't seen NMR data. Well, in that case I think it would be useful to have a function like extract(object, parameter). However, it's not that difficult to write such function, so it is up to you to add it. The approaches work. I just think that it's kinda weird to have in list both useful and unuseful names. Anyway, it's up to you as well. Can you please explain the second piece of code? I mean for (i in 2:length(nms)). Does it make sense? I mean, can be length(nms) greater than two? Or you just wanted to show general approach?

I'm closing the issue sense it's more like recommendation.

bryanhanson commented 7 years ago

The long term idea for compound files is to return a list element for each individual spectrum. So if there were 8 spectra, the list would have 9 elements, 1 for the metadata and 8 for the spectra. Right now this is how NMR spectra actually work. They return a list with 3 elements: the metadata, the real data and the imaginary data. So it's set up to generalize when we get around to properly dealing with compound files.

rguliev commented 7 years ago

Get it! Thank you! :)