Closed DarioS closed 2 years ago
Hi Dario,
Thanks for your attention to these issues, please understand that T cell ExTRECT is a very new tool and has only been public for a week, we are working hard to eliminate any bugs in the existing code and appreciate your input.
I am a bit confused by this as I can not replicate your issue, see below this quick dummy example using identical named files to those listed above, which only fails upon use of samtools as the files are empty and fails in the correct way when the dummy bai file is removed.
I also tested your suggestion for the corrected code and it seems equivalent to the code currently in the function.
bai.path and bai.path2 should both be a single value so the use of || vs | is equivalent, and as I understand it having double brackets around everything does not change the logical evaluation e.g in the situation where the index file OSCC_1-P.final.bai exists but not OSCC_1-P.final.bam.bai your suggestion leads to !(( FALSE || TRUE ))
which is identical to the current implementation !(FALSE | TRUE)
I was wondering if this issue is from trying to run the function on multiple bam files at once? Right now it is not vectorised to do that. I have added a test at the beginning to stop the function if more than one bam file is supplied.
No, just one BAM file. I was in a hurry to go to a meeting and now I notice that your brackets do wrap around both the first and second conditions correctly.
I found the real source of the problem. Our pipeline outputs BAM files to a directory called Final_bams
and I provide the absolute path to a BAM file. When the package code bai.path2 <- paste0(gsub('.bam','',bamPath),'.bai')
runs, it coverts the absolute path /scratch/hm82/OSCC/Final_bams/OSCC_1-P.final.bam.bai
into /scratch/hm82/OSCC/Finals/OSCC_1-P.final.bai
. Note how the directory name was undesirably changed as well as the intended file name suffix. It would be safer to use gsub("bam$", "bai", bamPath)
. The dollars sign ensures it only matches the end of a file name. With my tweaked pattern:
> bamPath
[1] "/scratch/hm82/OSCC/Final_bams/OSCC_1-P.final.bam"
> gsub("bam$", "bai", bamPath) # Don't match bam in directories' names.
[1] "/scratch/hm82/OSCC/Final_bams/OSCC_1-P.final.bai"
Thus, the directory name is not accidentally changed by the pattern matching and substitution. Only BAM file name will be.
Thanks, this has been fixed in the latest commit
https://github.com/McGranahanLab/TcellExTRECT/blob/c155d9fd1082621aae8eedaa2a15f4cb5900c524/R/getCovFromBam.R#L23-L25
The pipeline we use creates file names such as:
TcellExTRECT is the first bioinformatics tool in 18 months that hasn't worked with them. The corrected code is
You are missing a pair of round brackets which makes the logical evaluation order incorrect. Also, error message should use "proceeding" instead of "preceding".
https://github.com/McGranahanLab/TcellExTRECT/blob/c155d9fd1082621aae8eedaa2a15f4cb5900c524/R/getCovFromBam.R#L26