Closed rvernica closed 4 years ago
Thanks for the suggestion. Take a look at ?matrix2SpectraObject
. It requires the matrix to be in a file which in the short run you could create by writing your data.frame to a file (one extra step). If this function seems like the missing function you wish was there, I can update it to accept a data.frame from the local environment, assuming that samples were in rows and the colnames were in fact the sample names.
Right, matrix2SpectraObject
might work.
So far, I have found useful to store the spectrum data in a Data Frame like this:
df
inte: num ...
freq: num ...
cls: Factor ...
...
So, if I have 5
spectra and 100
frequencies, the Data Frame will contain 500
observations. This came in handy when fetching the data from the database and when plotting it with ggplot.
Can you send str(df)
for one of these data frames that has more than one spectrum in it? Thx.
Here is an example:
> str(df)
'data.frame': 10809 obs. of 6 variables:
$ box : Factor w/ 9 levels "2551","2552",..: 1 1 1 1 1 1 1 1 1 1 ...
$ id : Factor w/ 9 levels "2017-08-02_buffer",..: 8 8 8 8 8 8 8 8 8 8 ...
$ wave : num 400 401 402 403 404 405 406 407 408 409 ...
$ inte.raw : num 910 910 898 879 872 872 878 885 873 854 ...
$ cls : Factor w/ 3 levels "buffer",..: 3 3 3 3 3 3 3 3 3 3 ...
$ inte.raw.nor: num 0.274 0.274 0.258 0.233 0.224 ...
There are 9
spectra and 1201
frequencies. The frequency is in wave
and the intensity is in inte.raw
. There are 3
classes and the class is stored in cls
.
Just to double-check: df$wave
has the wavelength repeated 9x and concatenated, and df$inte.raw
is 9 concatenated spectra? And the wavelength values (each set of 9) are identical? Sounds like to reconstruct one takes the first 1201 values of wave
and separates inte.raw
into 9 groups of 1201 values to give 9 separate spectra plus the wavelengths. If that sounds correct, with this data set, how does one know there are 1201 values? Perhaps length(unique(df$wave))
?
Yes, to all of your questions. There has to be a unique identifier attribute for each spectrum, in this example, box
or id
are both unique identifiers, notice Factor w/ 9 levels
. So, you can do:
> length(df[box=="2551","wave"])
[1] 1201
> nrow(df[box=="2551",])
[1] 1201
I think I would write a function to convert that particular format to something more "tidy" in terms of one row per sample. Then just convert the resulting structure into a Spectra
object for direct use in ChemoSpec
. Do you want me to take a stab at it? If so, can you save df
as an Rdata object and attach? If you want to do it yourself, be sure to call chkSpectra
on the final object, and see ?Spectra
for the necessary data types.
For the end-user function, I would envision something along the lines of:
> ssp <- as.Spectra(df, name = id, freq = wave, intensity = inte.raw, group = cls)
Try this function and let me know. You'll need to change the extension to .R as Github doesn't accept .R asSpectra.txt
Looks good. I tried it like this:
as.Spectra(df, freq = "wave", intensities = "inte.raw", names = "id", gr.crit = df$cls,
units = c("", ""), desc = "")
For user-friendliness, you might not require the quotes around Data Frame variables. For example, in ggplot2
, you can do:
> ggplot(df, aes(x = wave, y = inte.raw)) + geom_point()
I would not use gr.crit = df$cls
for two reasons: One, gr.crit
needs only the unique values, you could possibly use unique(df$cls)
but still you have to be careful, these are factors, plus it is evaluating >10K values rather than 3. Second, check your groups, they may not be right. The unique values there IIRC are buffer, buffer_2 and buffer_3. gr.crit
is used in a grep process and hence grepping for "buffer" catches all the others. [update: just did this and yes, there is only one group and it is an integer due to taking the underlying encoded levels].
On not-quoting arguments: that would be the NSE world, like much of the tidyverse. To me, the time to program that is much greater than the time to type the quotes, so I'm going to leave that as "an exercise for the reader" as they used to say.
If this were to be implemented, could you use an S3 method? Something like:
as.Spectra <- function(x, ...) {
UseMethod("as.Spectra")
}
as.Spectra.data.frame <- function(x, name, freq, intensity, group) {
# Helper function
isWholeNo <- function(x, tol = .Machine$double.eps^0.5) {abs(x - round(x)) < tol}
# A few checks
if (length(units) != 2) stop("units should have length 2")
# Determine dimensions
no.pts <- length(unique(DF[,freq]))
no.spec <- length(DF[,freq])/no.pts
if (!isWholeNo(no.spec)) stop("no.spec was not an integer")
# Now build the Spectra object
Spectra <- vector("list", 9)
Spectra[[1]] <- unique(DF[,freq]) # frequency
Spectra[[2]] <- matrix(DF[,intensities], nrow = no.spec, byrow = TRUE)
Spectra[[3]] <- as.character(unique(DF[,names])) # names
Spectra[[4]] <- rep(NA_character_, no.spec) # groups
Spectra[[5]] <- rep("black", no.spec) # colors
Spectra[[6]] <- rep(1L, no.spec) # sym
Spectra[[7]] <- rep("a", no.spec) # alt.sym
Spectra[[8]] <- units # units
Spectra[[9]] <- desc # desc
# Update groups
for (i in 1:length(gr.crit)) {
which <- grep(gr.crit[i], Spectra[[3]])
if (length(which) == 0) warning("There was no match for gr.crit value ", gr.crit[i], " among the sample names.")
Spectra[[4]][which] <- gr.crit[i]
}
Spectra[[4]] <- as.factor(Spectra[[4]])
# Clean up and verify
class(Spectra) <- "Spectra"
names(Spectra) <- c("freq", "data", "names", "groups", "colors", "sym", "alt.sym", "units", "desc")
chkSpectra(Spectra)
return(Spectra)
}
With this approach other packages could implement conversion methods to your Spectra
class, making it easier to exchange NMR data between packages.
Hi Sergio… I’m traveling today; I’ll get back to you tonight. Bryan
On Oct 9, 2018, at 6:28 AM, Sergio Oller notifications@github.com wrote:
If this were to be implemented, could you use an S3 method https://adv-r.hadley.nz/s3.html#s3-methods? Something like:
as.Spectra <- function(x, ...) { UseMethod("as.Spectra") }
as.Spectra.data.frame <- function(x, name, freq, intensity, group) {
Helper function
isWholeNo <- function(x, tol = .Machine$double.eps^0.5) {abs(x - round(x)) < tol}
A few checks
if (length(units) != 2) stop("units should have length 2")
Determine dimensions
no.pts <- length(unique(DF[,freq])) no.spec <- length(DF[,freq])/no.pts if (!isWholeNo(no.spec)) stop("no.spec was not an integer")
Now build the Spectra object
Spectra <- vector("list", 9) Spectra[[1]] <- unique(DF[,freq]) # frequency Spectra[[2]] <- matrix(DF[,intensities], nrow = no.spec, byrow = TRUE) Spectra[[3]] <- as.character(unique(DF[,names])) # names Spectra[[4]] <- rep(NAcharacter, no.spec) # groups Spectra[[5]] <- rep("black", no.spec) # colors Spectra[[6]] <- rep(1L, no.spec) # sym Spectra[[7]] <- rep("a", no.spec) # alt.sym Spectra[[8]] <- units # units Spectra[[9]] <- desc # desc
Update groups
for (i in 1:length(gr.crit)) { which <- grep(gr.crit[i], Spectra[[3]]) if (length(which) == 0) warning("There was no match for gr.crit value ", gr.crit[i], " among the sample names.") Spectra[[4]][which] <- gr.crit[i] } Spectra[[4]] <- as.factor(Spectra[[4]])
Clean up and verify
class(Spectra) <- "Spectra" names(Spectra) <- c("freq", "data", "names", "groups", "colors", "sym", "alt.sym", "units", "desc") chkSpectra(Spectra) return(Spectra) } With this approach other packages could implement conversion methods to your Spectra class, making it easier to exchange NMR data between packages.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bryanhanson/ChemoSpec/issues/12#issuecomment-428141822, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIkPqgO0tqyPLpkOfIRRrO9bw3ILZ-qks5ujHpJgaJpZM4OsyqG.
No hurries! It was just a suggestion. Thanks a lot for your work and have a nice and safe trip!
Sergio, I think something like this is a good idea. Over the years, I've written a lot of scripts for users who have data in all sorts of formats. Many of them need a totally custom approach, but a lot of them have data frames, so the function you suggest would likely get many users most of the way. I think I would let the user disable the checks if desired -- I do that on matrix2SpectraObject
because often the names come very mangled and not R-suitable, so you have to run the function, see what you have, and then make a few final adjustments.
I'm currently working on a significant re-working of the ChemoSpec
internals, and your suggestion fits in well. It will likely take me about a month but it's on the to-do list. Thank you!
Sergio, do you have an example of another package that you want to convert to Spectra
object? I need to test the version of the function I am writing. Thanks.
Honestly I just have a custom package I am developing for a company, I hope to release it eventually but it's not on my hands.
If you want feedback or a code review I'll be happy to help :smiley:
I think a key question is whether the incoming data frames will have samples in rows or samples in columns. I plan to try to write something that would handle either, but there are a lot of possibilities and I have to think it through a bit. First however, I have to get a fresh version of ChemoSpec
out to CRAN.
On Oct 14, 2018, at 1:02 PM, Sergio Oller notifications@github.com wrote:
Honestly I just have a custom package I am developing for a company, I hope to release it eventually but it's not on my hands.
If you want feedback or a code review I'll be happy to help 😃
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bryanhanson/ChemoSpec/issues/12#issuecomment-429643145, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIkPoBoIB90dkhz1Q39kQiiCmZlB-efks5uk25AgaJpZM4OsyqG.
If this were to be implemented, could you use an S3 method? Something like:
as.Spectra <- function(x, ...) { UseMethod("as.Spectra") } as.Spectra.data.frame <- function(x, name, freq, intensity, group) { # Helper function isWholeNo <- function(x, tol = .Machine$double.eps^0.5) {abs(x - round(x)) < tol} # A few checks if (length(units) != 2) stop("units should have length 2") # Determine dimensions no.pts <- length(unique(DF[,freq])) no.spec <- length(DF[,freq])/no.pts if (!isWholeNo(no.spec)) stop("no.spec was not an integer") # Now build the Spectra object Spectra <- vector("list", 9) Spectra[[1]] <- unique(DF[,freq]) # frequency Spectra[[2]] <- matrix(DF[,intensities], nrow = no.spec, byrow = TRUE) Spectra[[3]] <- as.character(unique(DF[,names])) # names Spectra[[4]] <- rep(NA_character_, no.spec) # groups Spectra[[5]] <- rep("black", no.spec) # colors Spectra[[6]] <- rep(1L, no.spec) # sym Spectra[[7]] <- rep("a", no.spec) # alt.sym Spectra[[8]] <- units # units Spectra[[9]] <- desc # desc # Update groups for (i in 1:length(gr.crit)) { which <- grep(gr.crit[i], Spectra[[3]]) if (length(which) == 0) warning("There was no match for gr.crit value ", gr.crit[i], " among the sample names.") Spectra[[4]][which] <- gr.crit[i] } Spectra[[4]] <- as.factor(Spectra[[4]]) # Clean up and verify class(Spectra) <- "Spectra" names(Spectra) <- c("freq", "data", "names", "groups", "colors", "sym", "alt.sym", "units", "desc") chkSpectra(Spectra) return(Spectra) }
With this approach other packages could implement conversion methods to your
Spectra
class, making it easier to exchange NMR data between packages.
set.seed(123) bands=20 data <- data.frame(matrix(runif(60*bands),ncol=bands)) colnames(data) <- paste0(1:bands)
str(data)
test<-as.Spectra(data) Doesn't work for data frame object. Please I need a help, thanks
Closing: the wide variety of possible input formats is probably too hard to handle in a universal way. Better to use the options for importing and add to them as needed.
This is a feature request. From the docs it seems that the only way to create a Spctra Object is to have data stored in files. If data is not originating form files, how can one create a Spectra Object? For example, if data is coming from a database. To generalize, having a way to create Spectra Objects from Data Frames might be useful.