GoekeLab / proActiv

Estimation of Promoter Activity from RNA-Seq data
https://goekelab.github.io/proActiv/
Other
45 stars 14 forks source link

Not summarized values across conditions #31

Closed elenichri closed 3 years ago

elenichri commented 3 years ago

@jleechung @jonathangoeke I managed to go through the error (see issue #30) after communication with the server admin. I now run proActiv 1.0.0 which is the version that is compatible with Bioconductor 3.12 (this is the one that the server is running). The problem is that now I encounter a warning as the calculation of result via the proActiv function finishes :

... ... ... ... ... ... 
Calculating junction counts
Calculating normalized read counts...
converting counts to integer mode
Calculating log2 absolute promoter activity...
Calculating gene expression...
Calculating relative promoter activity...
Calculating positions of promoters...
Warning message:
In proActiv(files = paste0("~/Alternative_Promoters/GBM_junction_TCGA/",  :
 Condition argument is invalid. 
                Please ensure a 1-1 map between each condition and each file.
                Returning results not summarized across conditions.

Do you have any idea what the problem might be? Indeed I have unique files, so I cannot understand why I get the "not 1-1 map" warning. The gene expression is part of the metadata slot and not one of the assays:

show(result)
class: SummarizedExperiment 
dim: 98780 155 
metadata(1): geneExpression
assays(4): promoterCounts normalizedPromoterCounts absolutePromoterActivity relativePromoterActivity
rownames(98780): 1 2 ... 122631 122633
rowData names(8): promoterId geneId ... promoterPosition txId
colnames(155): X00b3b72b.0ac4.4016.bfa2.0bfcd5645d4f_junctions X03ea882b.fd10.4f19.8760.e32c763d31da_junctions ...
  fc4c962b.1149.4569.a6a7.8082e32ce505_junctions fe10e7c4.6e22.4b9d.a658.ac316086125d_junctions
colData names(0):

In addition, rowData(result) does not return the summarized expression across conditions, as it should according to the vignette.

rowData(result)

DataFrame with 98780 rows and 8 columns
       promoterId             geneId seqnames     start   strand internalPromoter promoterPosition
        <integer>        <character> <factor> <integer> <factor>        <logical>        <integer>
1               1 ENSG00000000003.15    chrX  100637104        -             TRUE                2
2               2 ENSG00000000003.15    chrX  100639991        -            FALSE                1
3               3  ENSG00000000005.6    chrX  100584936        +            FALSE                1
4               4  ENSG00000000005.6    chrX  100593624        +             TRUE                2
5               5 ENSG00000000419.12    chr20  50945861        -             TRUE                2
...           ...                ...      ...       ...      ...              ...              ...
122628     122628  ENSG00000288597.1    chrX  103919047        -            FALSE                1
122629     122629  ENSG00000288598.1    chr13  41517168        +            FALSE                1
122630     122630  ENSG00000288598.1    chr13  41547948        +            FALSE                2
122631     122631  ENSG00000288600.1    chr13  41488357        +            FALSE                1
122633     122633  ENSG00000288602.1    chr8   66667596        +            FALSE                1
                                                            txId
                                                          <list>
1      ENST00000373020.9,ENST00000612152.4,ENST00000614008.4,...
2      ENST00000373020.9,ENST00000612152.4,ENST00000614008.4,...
3                            ENST00000373031.5,ENST00000485971.1
4                            ENST00000373031.5,ENST00000485971.1
5      ENST00000371588.9,ENST00000466152.5,ENST00000371582.8,...
...                                                          ...
122628 ENST00000674469.1,ENST00000674236.1,ENST00000674363.1,...
122629     ENST00000674506.1,ENST00000674416.1,ENST00000674320.1
122630     ENST00000674506.1,ENST00000674416.1,ENST00000674320.1
122631                                         ENST00000674216.1
122633                       ENST00000520044.5,ENST00000519289.

Note that I use TopHat2 SJ files. A colleague of mine ran proActiv successfully on these files with Bioconductor v3.11.1 and proActiv v0.1.0.

jleechung commented 3 years ago

Hi @elenichri , the warning indicates that the condition vector that you pass in is of a different length than the number of input files you have. Can you check if they are of the same length?

proActiv does not summarize results by condition if the length of the condition vector and the length of the input files are different - there needs to be a 1 to 1 correspondence between your condition vector and input files.

elenichri commented 3 years ago

Thank you very much @jleechung. I had different vector lengths. I fixed them and now it works. It was so simple...I somehow missed this.

jleechung commented 3 years ago

Glad it works!