Open nick-youngblut opened 1 month ago
I minor typo in the README: annotations <- prepareAnnotation(gtf.file)
should be annotations <- prepareAnnotations(gtf.file)
Hi,
Thank you for reporting the typos in the documentation and error messages. I will have that fixed when we do our next update.
You should be able to provide fa.gz files for the genome, but on non windows machines you need to have the index and compressed index .fai and .gzi. Unfortuantely this is not yet written in the documentation but I will add it. Could you let me know if you had these files and if not try again and let me know if that works?
Kind Regards, Andre Sim
You should be able to provide fa.gz files for the genome
As you see from my post above, I can't use gzip-compressed fasta input on my Ubuntu 22.04.4 system.
You specifically stated "fa.gz files". Bambu doesn't support alternative (gzip'd) fasta file extensions (e.g., .fastq.gz
or .fna.gz
)?
Bambu doesn't check the file extension, and as for our purposes .fa.gz, .fastq.gz and .fna.gz are all the same format they should all work so long as include in the same directory, the respective index files. So if you are using .fna.gz there should also be a .fna.gz.fai and .fna.gz.gzi in the directory. If you compressed your genome with bgzip you can generate the index fails with samtools faidx.
Below is the script I used to test it, the console output (warnings removed for clarity), and the directory where the .fna.gz is stored so you can compare
sample <- system.file("extdata", "SGNex_A549_directRNA_replicate5_run1_chr9_1_1000000.bam", package = "bambu")
fa.file <- "./Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fna.gz"
annotations <- readRDS(system.file("extdata", "annotationGranges_txdbGrch38_91_chr9_1_1000000.rds", package = "bambu"))
se = bambu(reads = sample, annotations = annotations, genome = fa.file)
--- Start generating read class files ---
'getOption("repos")' replaces Bioconductor standard repositories, see 'help("repositories",
package = "BiocManager")' for details.
Replacement repositories:
CRAN: https://cloud.r-project.org
Detected 3 warnings across the samples during read class construction. Access warnings with metadata(bambuOutput)$warnings
--- Start extending annotations ---
WARNING - Less than 50 TRUE or FALSE read classes for NDR precision stabilization.
NDR will be approximated as: (1 - Transcript Model Prediction Score)
Using a novel discovery rate (NDR) of: 0
WARNING - No novel transcripts meet the given thresholds. Try a higher NDR.
--- Start isoform quantification ---
--- Finished running Bambu ---
> list.files()
[1] "Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fna.gz"
[2] "Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fna.gz.fai"
[3] "Homo_sapiens.GRCh38.dna_sm.primary_assembly_chr9_1_1000000.fna.gz.gzi"
Thanks for the explanation. Maybe it would best to add some input file checks to provide a more informative error message than Input genome file not readable.Requires a FASTA or BSgenome name
? For instance: Your input genome appears to be compressed; you then must provide corresponding .gz.fai and .gz.gzi files
My code:
The error:
If I uncompress the genome fasta file, there is no error. It would be helpful if bambu supported gzip'd input, given the potentially large size of the input files.
Also, the space (or line return) is missing in:
sessionInfo