gagneurlab / FRASER-analysis

Accompanying analysis code for the FRASER manuscript
https://tinyurl.com/FRASER-paper
MIT License
26 stars 7 forks source link

Problems using FRASER 2.0 in R (optimHyperParams and FRASER functions) #6

Open mmartinezj opened 6 months ago

mmartinezj commented 6 months ago

Hello, first of all, thank you for your great work. I have recently started to use FRASER 2.0 for aberrant splicing analysis and I have some doubts regarding FRASER even after revising the papers and the vignette (https://bioconductor.org/packages/release/bioc/html/FRASER.html). I have been having problems with the hyperparameter optimization and executing the FRASER command in R. For some context, here is some info about my data:

  1. I start my analysis from bam files (aligned to hg38 from ENSEMBL)
  2. I'm using version 1.99.3
  3. I have 31 samples in my cohort
  4. My annotation_file in order to create settings with the FRASER::FraserDataSet function looks similar to this (but with the 31 samples):

annot_hg38_subset

With 31 samples

Firstly, regarding the optimHyperParams function, I don't fully understand how to choose objectively one implementation or another (if there is a way to objectively chose one) or which value for noise to use for example. In addition, regarding the optimHyperParams, I also don't have clear if it is just necessary to calculate this optimization for the "jaccard" type or if it is necessary to calculate also the "psi5", "psi3" and "theta" ones in order to fit correctly the model. For now I have only been able to use the "PCA" implementation, the others get stucked after the "Injecting outliers" step (when I use my 31 samples, on the optimization it says "injecting outliers" and after 3 hours the process is on the same step).

Secondly, after performing the hyperparamenter optimization with the "PCA" implementation (as it is the only one that has worked for me), the FRASER command just works with the "PCA" implementation and if I don't try to specify a type inside the command, as if I try to use other options, an error rises (although it cant calculate pvalues as it stated there aren't hgnc symbols available). Also, when trying to annotate the hgnc symbols after using FRASER with "PCA" implementation, it doesn't work.

Subset of 4 samples

Afterwards I tested FRASER with a subset of my samples (4 samples) to check if this errors still happened with a subset of samples. When testing the optimHyperParam, only the "PCA" implementation works, the other 2 show an error:

PCA-BB-Decoder optim_hyper_PCABBDecoder_1 optim_hyper_PCABBDecoder_2

AE optim_hyper_AE_1 optim_hyper_AE_2

When using FRASER function after the "PCA" optimHyperParam, only "PCA" implementation works (although it cant calculate pvalues as it stated there aren't hgnc symbols).

PCA FRASER_PCA_optim_PCA

PCA-BB-Decoder FRASER_PCABBDecoder_optim_PCA

AE FRASER_AE_optim_PCA

Finally, when trying to annotate the hgnc symbols after using FRASER with "PCA" implementation, it doesn't work.

annotate_hgcn_symbols annotate_hgcn_symbols_2

Test data

After facing these problems with my data, I tried to use FRASER with the data available with the package. When I use: fds <- createTestFraserDataSet() fds <- optimHyperParams(fds, type="jaccard", implementation="PCA-BB-Decoder") fds <- FRASER(fds, q=c("jaccard"=3), implementation="PCA-BB-Decoder") it seems to work without any problem: optim_hyper_PCABBDecoder_with_test_data

Using DROP pipeline

Now I'm trying to use DROP pipeline while I'm trying to solve these issues in R, but I'm having troubles with DROP too (I will open another issue on the corresponding repo).

Session info

sessioninfo_1 sessioninfo_2

Sorry for the long issue. Thank you very much for your time and help in advance!

Miriam Martínez