Open Yatros opened 3 years ago
Hello,
I have tried to perform an additional test with the plotBoxplot() function. In this case, I have tried to plot the results from two different genes (GENE_1 located in the minus strand of chr6 and GENE_2 located in the plus strand of chr1) in the same plot.
plotBoxplot(result = resultlist[[1]], sampleName = sampleNames[1],
countWindows = countWindows,
selectedGenes = selectedGenes, showGene = c(1,2))
In this case, the plotBoxplot() function plots both genes in chromosomal order, that is GENE_2, GENE_1 and it is able to plot the 12 exons from GENE_1, but it misses to plot the first exon from GENE_2.
I think that the problem may be the way in which the function indexes the exons from each gene.
If I try to specify exonRange = c(1,12)
for GENE_1 when plotting it alone, the function returns the following error:
plotBoxplot(result = resultlist[[1]], sampleName = sampleNames[1],
+ countWindows = countWindows, exonRange = c(1,12),
+ selectedGenes = selectedGenes, showGene = 1)
Error in plotData[(exonRange[1] + 1):(exonRange[2] + 1), ] : subscript out of bounds
It looks like the first exon was always "exonRange[1] + 1" instead of "exonRange[1] + 0"
I hope that these examples show the problem that I have and that the developer of the package can explain how to perform the plots in the right way.
A suggestion: since the plotBoxplot() function only works if the bed file is sorted in chromosomal order, it would be nice to have the option of flipping the plot for those genes that are located in the minus strand. This is especially important to keep consistency across plots for publication purposes.
Please let me know if you need any additional data to reproduce the error.
Thanks again,
Hi again,
I have found a way of "fixing" the plotting problem. I added a "fake" line at the very beginning of my bed file. This additional "fake exon" overlaps partially with the first exon of my first real gene. Here you have the headers of my original bed file and of the "fixed" bed file.
Original bed file:
"Fixed" bed file:
It looks as everything was off by one record. Now all genes are plotted in the right way and the plots include all exons for each gene.
However, as I suggested previously it would be great to have the option of flipping the plots so that the genes that are located in the minus strand can be plotted in transcriptional order (i.e. Ex1, Ex2, Ex3, ...), instead of chromosomal order (i.e. Ex12, Ex11, Ex10...).
Current plot of GENE_1 once the bed file has been "fixed":
Now that I think about it, maybe the software was expecting a bed file with a header, but that is not what is specified in the panelcn.MOPS documentation regarding the bed file format.
Best,
Great catch. For some reason, I never plotted the first gene in my bed file. I know what the problem is and I am working on a solution.
I fixed the issue and also changed it so that if you order the countWindows differently the order will change accordingly in the plot (labels and data). So far I only committed it here and not to bioconductor. Can you please test if it solves your problem? Thanks
Hello,
I have checked the current version of the package and there are certain things that have been fixed, but it doesn't work 100% correctly, yet.
I'll explain you step by step my reasoning process to see if it make sense to you:
My bed file contains 24 genes located in autosomal chromosomes. In total there are 506 exons, one per line.
As expected, the countWindows <- getWindows(bed)
command generates a data frame with 506 observations of 6 variables each ('chromosome', 'start', 'end', 'name', 'gene' and 'exon').
However, when I run the resultlist <- runPanelcnMops(XandCB, countWindows = countWindows, selectedGenes = selectedGenes)
, the runPanelcnMops() function only recognizes 495 regions of interest (ROIs). See following picture:
I don't know the reason for these ROIs not being recognized. I guess that they may have a low read count in the original BAM file, since they affect most of the times the first or last exon of the gene. There is only one case, where it affects exons 1-4 from one gene (GENE_8). This problem already existed in the previous version of the software, too. I have always been able to include only 495 ROIs in all my analyses. The createResultTable() function is able to show the results for the 495 ROIs that were identified by the runPanelcnMops() function whenever I perform any analysis.
But every time I try to plot a gene that has a different number of exons in the bedfile/countWindows data frame compared to the number of ROIs identified by the runPanelcnMops() function, the plotBoxplot() function returns the following error:
> plotBoxplot(result = resultlist[[1]], sampleName = sampleNames[1],
+ countWindows = countWindows,
+ selectedGenes = selectedGenes, showGene = 2)
Error in plotData[geneWindowsPaste, ] : subscript out of bounds
Additionally, the first gene of the bed file cannot be plotted even though the number of records in the bedfile and the number of ROIs identified by the runPanelcnMops() function is exactly the same (n = 6; see picture above). The error looks exactly the same:
> plotBoxplot(result = resultlist[[1]], sampleName = sampleNames[1],
+ countWindows = countWindows,
+ selectedGenes = selectedGenes, showGene = 1)
Error in plotData[geneWindowsPaste, ] : subscript out of bounds
Only those genes that match the number of exons between the countWindow data frame and the resultlist object are plotted. In this case, they are plotted in the right way, that is including all exons from each gene.
In the previous version of the package, the plotBoxplot() function plotted all genes in both cases:
In both cases, all plots were always lacking one exon, though.
I haven't had time to check if the order of the countWindows works properly or not and I want to fix the current problem before digging further into other potential errors.
Let me know if you have any doubts or need further clarification of what I am doing.
I can send you my real bed file if you want to have a look at it, too.
Thanks again for your answer,
Best,
Sorry for the delay, only have time for this on weekends.
Thank you for your patience. It would be great if you could try one more time.
Hello,
First of all, thank you very much for developing panelcn.mops. It is a really helpful tool.
I am having a problem when plotting all exons of a candidate gene from a panel. The problem is basically that if the gene is in the forward strand, the first exon does not show up in the plot and if the gene is in the reverse strand, the last exon of the gene is not plotted. My bed file is sorted by chr, position, so basically the plotBoxplot() command is not plotting the first exon that appears in the bed file.
However, the results table created with the createResultTable() command shows all the exons from each candidate gene.
Here you have the code that I have used to analyze my gene panel so that you can figure out if I have made any errors:
Graphical example:
Analysis performed with +/- 31 bp flanking region for each exon. In this case Ex12 is missing from the plot:
I have continued doing tests and I have tried to sort the bed file according to transcriptional order (Ex1, Ex2, Ex3, ...) to see if the function plotBoxplot() would plot the Ex1 for the genes from the reverse strand. The analysis is performed in the right way when you visualize it with the resulttable <- createResultTable() function. This analysis was performed without the +/- 31 bp flanking region for each exon. Therefore the exon sizes and plots are slightly different, but the concept is the same one:
The plotBoxplot() function does not plot the Ex1 for the reordered exons, as expected.
Additionally, I have found a potential "bug" in the plotBoxplot() function. It reverts the X-axis labels according to the new bed file (Ex1, Ex2, Ex3, Ex4,....) , but the boxplots stay in the same order, that is in chromosomal order (Ex12, Ex11, Ex10, Ex9, ...), which would generate an erroneous plot:
Now the sample would have an Ex7-Ex10 deletion, instead of the real Ex3-Ex6 deletion that is shown in both tables.
In order for the plotBoxplot() function to work properly, the bed file needs to be sorted in chromosomal order. If not, you can get an erroneous plot, even when the analysis and the table show the correct results.
Am I doing something wrong? How can I plot the first exon of each gene?
I have even tried to add the UTR region to the bed file to have a first record for each gene before the first exon, but that does not work either.
Does anybody have had these issues before?
Thank you very much,
Best Regards,