Open sparthib opened 2 months ago
Hi Sowmya,
Could you post the output of metadata(se)$readToTranscriptMaps[[1]]?
If you read in the rds_files (with readRDS) does the metadata contain the readNames and readId like in the example below that I generated from the test data?
Does your bam file still have read names, some archives and tools remove the read names in order to reduce the size of the file?
From looking at your inputs, it looks correct to me (note in the first block, if your goal is only to produce the read classes, you can set discovery & quant = FALSE to skip these steps)
*I also want to add in case you were not aware. The readToTranscriptMaps are not informative to how quantification works. Unlike other quantification tools which assign reads to transcripts to generate counts, bambu uses an expectation maximization algorithm to assign the majority of reads, so direct read to transcript assignments for all reads will not exist. There are other uses for the read to transcript map though so I would be happy to continue to troubleshoot if that is the case.
Kind Regards, Andre Sim
Hi Andre, thank you for super quick response,
My bam file contains the read IDs, so I don't think that's the issue.
This is what my metadata looks like for the final output
equalMatches compatibleMatches
1
2
3
4
5
6
Here's a snippet of my intermediate read class file
> intermediate
class: RangedSummarizedExperiment
dim: 4530587 1
metadata(3): warnings readNames readId
assays(1): counts
rownames(4530587): rc.1 rc.2 ... rc.4530586 rc.4530587
rowData names(23): chr.rc strand.rc ... txScore.noFit txScore
colnames(1): sample_name
colData names(1): name
readNames
contains the list of IDs in my bam file.
If a read is partially assigned to multiple isoforms using EM, what would be the usecase for readToTranscript mapping if I may ask? Also, by other tools, do you mean tools that use the reference transcriptome instead of genome?
Best, Sowmya
Hi Sowmya,
The intermediate file looks fine to me too, so seeing that final output is very strange. Are you able to run bambu using the test data with trackReads on to see if that works? To solve this I may need you to send me the intermediate file so that I can see if I can replicate this on my end. Would this be possible?
Regarding your final question. The reason we implemented the trackReads option in the first place was we were entering the LRGASP evaluation which required a read to transcript map as one of the submission requirements. Despite it not being useful for quantification we realized there were use cases where it was useful such as identifying the reads that are assigned to novel transcripts to then extract them and do single molecule analysis such as looking for RNA modifications. By other tools I mean tools such as (but not limited to) IsoQuant and FLAIR which also happen to be genome alignment based tools.
Kind Regards, Andre Sim
hi Andre,
Thanks for your answer. I am re-running bambu, and I see a new warning I haven't see before and the read classes end up not getting produced.
Here's my code:
read_class <- bambu(reads = paste0(bam_dir),
annotations = annotation,
genome = fa.file,
lowMemory = TRUE,
discovery = FALSE,
quant = FALSE)
saveRDS(read_class, paste0(output_dir, "read_class.rds"))
Here's the warning
[10:02:31] WARNING: src/learner.cc:553:
If you are loading a serialized model (like pickle in Python, RDS in R) generated by
older XGBoost, please export the model by calling `Booster.save_model` from that version
first, then load it back in current version. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for more details about differences between saving model and serializing.
Here's my session info:
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.3.2 Patched (2024-02-08 r85876)
os Rocky Linux 9.2 (Blue Onyx)
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz US/Eastern
date 2024-06-01
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
abind 1.4-5 2016-07-21 [2] CRAN (R 4.3.2)
AnnotationDbi 1.64.1 2023-11-03 [2] Bioconductor
bambu * 3.4.0 2023-10-24 [1] Bioconductor
Biobase * 2.62.0 2023-10-24 [2] Bioconductor
BiocFileCache * 2.10.1 2023-10-26 [2] Bioconductor
BiocGenerics * 0.48.1 2023-11-01 [2] Bioconductor
BiocIO * 1.12.0 2023-10-24 [2] Bioconductor
BiocManager 1.30.22 2023-08-08 [2] CRAN (R 4.3.2)
BiocParallel 1.36.0 2023-10-24 [2] Bioconductor
biomaRt 2.58.2 2024-01-30 [2] Bioconductor 3.18 (R 4.3.2)
Biostrings * 2.70.2 2024-01-28 [2] Bioconductor 3.18 (R 4.3.2)
bit 4.0.5 2022-11-15 [2] CRAN (R 4.3.2)
bit64 4.0.5 2020-08-30 [2] CRAN (R 4.3.2)
bitops 1.0-7 2021-04-24 [2] CRAN (R 4.3.2)
blob 1.2.4 2023-03-17 [2] CRAN (R 4.3.2)
BSgenome * 1.70.1 2023-11-01 [2] Bioconductor
cachem 1.0.8 2023-05-01 [2] CRAN (R 4.3.2)
cli 3.6.2 2023-12-11 [2] CRAN (R 4.3.2)
codetools 0.2-19 2023-02-01 [3] CRAN (R 4.3.2)
crayon 1.5.2 2022-09-29 [2] CRAN (R 4.3.2)
curl 5.2.0 2023-12-08 [2] CRAN (R 4.3.2)
data.table 1.15.0 2024-01-30 [2] CRAN (R 4.3.2)
DBI 1.2.1 2024-01-12 [2] CRAN (R 4.3.2)
dbplyr * 2.4.0 2023-10-26 [2] CRAN (R 4.3.2)
DelayedArray 0.28.0 2023-10-24 [2] Bioconductor
digest 0.6.34 2024-01-11 [2] CRAN (R 4.3.2)
dplyr * 1.1.4 2023-11-17 [2] CRAN (R 4.3.2)
fansi 1.0.6 2023-12-08 [2] CRAN (R 4.3.2)
fastmap 1.1.1 2023-02-24 [2] CRAN (R 4.3.2)
filelock 1.0.3 2023-12-11 [2] CRAN (R 4.3.2)
generics 0.1.3 2022-07-05 [2] CRAN (R 4.3.2)
GenomeInfoDb * 1.38.5 2023-12-28 [2] Bioconductor 3.18 (R 4.3.2)
GenomeInfoDbData 1.2.11 2024-02-09 [2] Bioconductor
GenomicAlignments 1.38.2 2024-01-16 [2] Bioconductor 3.18 (R 4.3.2)
GenomicFeatures 1.54.3 2024-01-31 [2] Bioconductor 3.18 (R 4.3.2)
GenomicRanges * 1.54.1 2023-10-29 [2] Bioconductor
glue 1.7.0 2024-01-09 [2] CRAN (R 4.3.2)
hms 1.1.3 2023-03-21 [2] CRAN (R 4.3.2)
httr 1.4.7 2023-08-15 [2] CRAN (R 4.3.2)
IRanges * 2.36.0 2023-10-24 [2] Bioconductor
jsonlite 1.8.8 2023-12-04 [2] CRAN (R 4.3.2)
KEGGREST 1.42.0 2023-10-24 [2] Bioconductor
lattice 0.22-5 2023-10-24 [3] CRAN (R 4.3.2)
lifecycle 1.0.4 2023-11-07 [2] CRAN (R 4.3.2)
magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.3.2)
Matrix 1.6-5 2024-01-11 [3] CRAN (R 4.3.2)
MatrixGenerics * 1.14.0 2023-10-24 [2] Bioconductor
matrixStats * 1.2.0 2023-12-11 [2] CRAN (R 4.3.2)
memoise 2.0.1 2021-11-26 [2] CRAN (R 4.3.2)
pillar 1.9.0 2023-03-22 [2] CRAN (R 4.3.2)
pkgconfig 2.0.3 2019-09-22 [2] CRAN (R 4.3.2)
png 0.1-8 2022-11-29 [2] CRAN (R 4.3.2)
prettyunits 1.2.0 2023-09-24 [2] CRAN (R 4.3.2)
progress 1.2.3 2023-12-06 [2] CRAN (R 4.3.2)
purrr 1.0.2 2023-08-10 [2] CRAN (R 4.3.2)
R6 2.5.1 2021-08-19 [2] CRAN (R 4.3.2)
rappdirs 0.3.3 2021-01-31 [2] CRAN (R 4.3.2)
Rcpp 1.0.12 2024-01-09 [2] CRAN (R 4.3.2)
RCurl 1.98-1.14 2024-01-09 [2] CRAN (R 4.3.2)
readr * 2.1.5 2024-01-10 [2] CRAN (R 4.3.2)
restfulr 0.0.15 2022-06-16 [2] CRAN (R 4.3.2)
rjson 0.2.21 2022-01-09 [2] CRAN (R 4.3.2)
rlang 1.1.3 2024-01-10 [2] CRAN (R 4.3.2)
Rsamtools 2.18.0 2023-10-24 [2] Bioconductor
RSQLite 2.3.5 2024-01-21 [2] CRAN (R 4.3.2)
rtracklayer * 1.62.0 2023-10-24 [2] Bioconductor
S4Arrays 1.2.0 2023-10-24 [2] Bioconductor
S4Vectors * 0.40.2 2023-11-23 [2] Bioconductor 3.18 (R 4.3.2)
sessioninfo * 1.2.2 2021-12-06 [2] CRAN (R 4.3.2)
SparseArray 1.2.3 2023-12-25 [2] Bioconductor 3.18 (R 4.3.2)
stringi 1.8.3 2023-12-11 [2] CRAN (R 4.3.2)
stringr 1.5.1 2023-11-14 [2] CRAN (R 4.3.2)
SummarizedExperiment * 1.32.0 2023-10-24 [2] Bioconductor
tibble 3.2.1 2023-03-20 [2] CRAN (R 4.3.2)
tidyr 1.3.1 2024-01-24 [2] CRAN (R 4.3.2)
tidyselect 1.2.0 2022-10-10 [2] CRAN (R 4.3.2)
tzdb 0.4.0 2023-05-12 [2] CRAN (R 4.3.2)
utf8 1.2.4 2023-10-22 [2] CRAN (R 4.3.2)
vctrs 0.6.5 2023-12-01 [2] CRAN (R 4.3.2)
withr 3.0.0 2024-01-16 [2] CRAN (R 4.3.2)
xgboost 1.7.7.1 2024-01-25 [2] CRAN (R 4.3.2)
XML 3.99-0.16.1 2024-01-22 [2] CRAN (R 4.3.2)
xml2 1.3.6 2023-12-04 [2] CRAN (R 4.3.2)
XVector * 0.42.0 2023-10-24 [2] Bioconductor
yaml 2.3.8 2023-12-11 [2] CRAN (R 4.3.2)
zlibbioc 1.48.0 2023-10-24 [2] Bioconductor
Let me know if you have any suggestions!
Thanks! Sowmya
Hi there,
I am using
bambu 3.4.0
for quantification. I am curious about the read to transcript assignment, so I used the function mentioned in the documentmetadata(se)$readToTranscriptMaps[[1]]
, but I don't find thereadID
column while the other columns are present. I tried the multi-sample approach where first, I generated the read class withtrackReads = TRUE
and then, performed quantification from the intermediate files.
Would be great to have more insights on this.
Thanks,
Sowmya