AviranLab / dStruct

Method for identifying differential reactive regions from RNA structurome profiling data
BSD 2-Clause "Simplified" License
3 stars 3 forks source link

Issue with dStructome when using no biological replicates (reps_A = 1, reps_B = 1) #1

Open RemagenRe opened 1 week ago

RemagenRe commented 1 week ago

Dear dStruct developers,

I encountered an issue while using the dStructome function from the dStruct R package when running analyses without biological replicates (i.e., reps_A = 1 and reps_B = 1). Here's the structure of my input data and the code I used and the function returns the following error:

class(structure_data) [1] "list" str(head(structure_data, 5)) List of 5 $ AT5G53440.2:'data.frame': 3949 obs. of 2 variables: ..$ A1: num [1:3949] NA 0 NA 0 NA 0 NA 0 NA 0 ... ..$ B1: num [1:3949] NA 0 NA 0 NA 0 NA 0 NA 0 ... $ AT5G53450.1:'data.frame': 2434 obs. of 2 variables: ..$ A1: num [1:2434] NA 0.122 NA 0.468 NA NA 0 NA 0 NA ... ..$ B1: num [1:2434] NA 0 NA 0 NA NA 0.33 NA 0 NA ... $ AT5G53450.2:'data.frame': 2365 obs. of 2 variables: ..$ A1: num [1:2365] NA NA NA NA NA 0 NA 0.466 0 0.425 ... ..$ B1: num [1:2365] NA NA NA NA NA 0 NA 0 0 0 ... $ AT5G53460.1:'data.frame': 7625 obs. of 2 variables: ..$ A1: num [1:7625] 0 0 0 0 0 NA 0 0 0 NA ... ..$ B1: num [1:7625] 0 0 0 0 0 NA 0 0.392 0 NA ... $ AT5G53460.2:'data.frame': 6971 obs. of 2 variables: ..$ A1: num [1:6971] NA 0 0 NA 0 NA 0.714 NA 0 0 ... ..$ B1: num [1:6971] NA 0 0 NA 0 NA 0 NA 0 0.382 ... names(structure_data) %>% head() [1] "AT5G53440.2" "AT5G53450.1" "AT5G53450.2" "AT5G53460.1" "AT5G53460.2" "AT5G53460.3" class(structure_data[[1]]) [1] "data.frame" head(structure_data[["AT5G53440.2"]], n= 2) A1 B1 1 NA NA 2 0 0

res <- dStructome(structure_data, 1, 1, processes = 1) Error in strsplit(x, split = "") : non-character argument

df410451b9072a070f35972110db86b0 d4ab007d00c99aff17a54f07bc7cfff9

It seems that dStructome does not support inputs without biological replicates. I expected the function to process the data without errors, even if there are no biological replicates provided. Any guidance would be appreciated.

Thank you for your help!

dataMaster-Kris commented 1 day ago

Hi @RemagenRe,

You are right, dStruct requires that there be biological replicates for at least one of the groups. This is needed to compute within-group d-scores.

If it is possible, you should do a replicate experiment. If for whatever reason, this is not possible, you can copy the A1 sample as a biological replicate of itself. dStruct should run fine even if all the reactivity values in A1 and A2 samples are equal for all nucleotides. This is effectively imposing an assumption that there is no biological variation between replicate samples, i.e., within-group d = 0. You should report this assumption when you write-up your methods for analysis.

Another option is to simulate a replicate sample if you have a meaningful noise model for structurome profiling.

If you don't have experimentally obtained biological replicates, you may want to tune up the ∆d parameter, i.e., the requirement for how high between-group d-score should be compared to the within-group d-score. This option is not available in the version of dStruct in this repository, which is the published version in Genome Biology, but you can find it in the latest version on Bioconductor or install from this Github repo.

RemagenRe commented 22 hours ago

Hi @RemagenRe,

You are right, dStruct requires that there be biological replicates for at least one of the groups. This is needed to compute within-group d-scores.

If it is possible, you should do a replicate experiment. If for whatever reason, this is not possible, you can copy the A1 sample as a biological replicate of itself. dStruct should run fine even if all the reactivity values in A1 and A2 samples are equal for all nucleotides. This is effectively imposing an assumption that there is no biological variation between replicate samples, i.e., within-group d = 0. You should report this assumption when you write-up your methods for analysis.

Another option is to simulate a replicate sample if you have a meaningful noise model for structurome profiling.

If you don't have experimentally obtained biological replicates, you may want to tune up the ∆d parameter, i.e., the requirement for how high between-group d-score should be compared to the within-group d-score. This option is not available in the version of dStruct in this repository, which is the published version in Genome Biology, but you can find it in the latest version on Bioconductor or install from this Github repo.

Thank you very much for your detailed and prompt response!

I appreciate the guidance regarding the biological replicates. I would like to follow your suggestion on tuning the ∆d parameter, but I'm not quite sure how to proceed with this adjustment. Could you provide some additional instructions on how to set or modify the ∆d parameter?

For context, here's the code I used and the error I encountered:

> library(dStruct)
> packageVersion("dStruct")
[1] ‘1.8.0’
> res <- dStructome(structure_data,
+                   reps_A = 1,
+                   reps_B = 1,  
+                   evidence = 2.0,
+                   processes = 1)
Error in strsplit(x, split = "") : non-character argument
>