MalteThodberg / CAGEfightR

Analysis of Cap Analysis of Gene Expression (CAGE) data using Bioconductor
GNU General Public License v3.0
8 stars 2 forks source link

calcShape error #19

Open emidalla opened 4 months ago

emidalla commented 4 months ago

Dear Malte, when running the following code

TSSs_quick <- calcShape(TSSs_quick, pooled=supportedCTSSs, outputColumn = 'IQR', shapeFunction = shapeIQR, lower=0.25, upper=0.75)

I get this error:

Splitting by strand...
Applying function to each cluster...
Error in if (f(x[[i]])) return(i) : missing value where TRUE/FALSE needed

I had a look at the 'calcShape' function but I was not able to understand where the missing values could be... The names of the objects are taken from your tutorial.

This is how 'TSSs_quick' is structured:

class: RangedSummarizedExperiment
dim: 151397 3
metadata(0):
assays(2): counts TPM
rownames(151397): chr1:587673-587673;+ chr1:629502-629502;+ ... chrM:16460-16460;-
  chrM:16548-16549;-
rowData names(7): score thick ... txID txType
colnames(3): PETRI_rep1 PETRI_rep2 PETRI_rep3
colData names(1): subsetTags

and this is 'supportedCTSSs':

class: RangedSummarizedExperiment
dim: 26403 3
metadata(0):
assays(2): counts TPM
rownames: NULL
rowData names(4): score support txID txType
colnames(3): PETRI_rep1 PETRI_rep2 PETRI_rep3
colData names(1): subsetTags

Please also find attached my sessionInfo.

Thank you very much for your time. Best regards, Emiliano

sessionInfo_CAGEfightR.txt

MalteThodberg commented 4 months ago

Hello again,

Interesting, I do not recall seeing that error before.

Can you run the vignette and the calcShape examples?

I cannot tell much about how the data looks just from the SummarizedExperiment overviews. Do you perhaps have a little reproducible example?

emidalla commented 4 months ago

Hello Malte,

Can you run the vignette and the calcShape examples?

Yes I do

Do you perhaps have a little reproducible example?

The test data I am working on is taken from here.

This is a snapshot of the 'TTS_quick' object

DataFrame with 151397 rows and 4 columns
                         score     thick                   txID   txType
                     <numeric> <IRanges>            <character> <factor>
chr1:587673-587673;+   6.38174    587673 ENST00000423796.1;EN.. promoter
chr1:629502-629502;+   6.38174    629502      ENST00000457540.1 proximal
chr1:629651-629661;+  18.46211    629651      ENST00000457540.1 promoter
chr1:629682-629700;+  12.30807    629682      ENST00000457540.1 promoter
chr1:629739-629739;+   6.38174    629739      ENST00000457540.1 promoter
...                        ...       ...                    ...      ...
chrM:16298-16298;-     6.83125     16298      ENST00000387461.2 proximal
chrM:16355-16365;-    19.13933     16365      ENST00000387461.2 proximal
chrM:16392-16410;-    18.91752     16410      ENST00000387461.2 proximal
chrM:16460-16460;-     6.38174     16460      ENST00000387461.2 proximal
chrM:16548-16549;-    18.91752     16549      ENST00000387461.2 proximal

and this one of 'supportedCTSSs'

DataFrame with 26403 rows and 4 columns
          score   support              txID   txType
      <numeric> <integer>       <character> <factor>
1      727.9145         3 ENST00000414273.1 promoter
2       62.6590         2 ENST00000414273.1 promoter
3       25.1266         2 ENST00000414273.1 promoter
4       24.1639         2 ENST00000414273.1 promoter
5       24.1639         2 ENST00000414273.1 exon    
...         ...       ...               ...      ...
26399  245.1750         3 ENST00000387461.2 promoter
26400   36.8846         2 ENST00000387461.2 promoter
26401   23.8491         2 ENST00000387461.2 proximal
26402   23.8491         2 ENST00000387461.2 proximal
26403   23.8491         2 ENST00000387461.2 proximal

EDIT: if I use the 'shapeEntropy' or the 'shapeMean' argument for 'shapeFunction', instead of 'shapeIQR', everything works.

Best, Emiliano.

MalteThodberg commented 4 months ago

Interesting! So it must be the default shapeIQR that's failing.

It's been a while since I've looked at this function: Are the TCs you calculate also based on the supportedCTSSs object?

emidalla commented 3 months ago

You nailed it, Malte! I used normal CTSSs to define the 'TSSs_quick' object, but then used the supportedCTSSs in combination with TSSs_quick to calculate the shape.

I guess it is recommended to use 'supported CTSSs' also with the 'quickTSSs' function, right?

Apologies for this very stupid error and thank you very much for your time and precious help.

Best, Emiliano.

MalteThodberg commented 3 months ago

Not stupid at all.

I will leave this issue open: It might be a good idea for CAGEfightR to include a specific check for and produce a more sane error message before failing with that weird message.

emidalla commented 3 months ago

Maybe just specify in the 'Introduction to CAGEfightR' that one object must derive from the other: I thought that any RangedSummarizedExperiment would be fine but I was definitely wrong. Thank you very much again!