Closed williambakhache closed 9 months ago
Hm. I thought I fixed this in the last release cycle:
library(screenCounter)
# Creating an example dual barcode sequencing experiment.
known.pool <- c("AGAGAGAGA", "CTCTCTCTC",
"GTGTGTGTG", "CACACACAC")
# Adding some N's to the sequence data.
N <- 1000
barcodes <- sprintf("CAGCTANNCGTACG%sCCAGCTCGANNTCG",
sample(known.pool, N, replace=TRUE))
names(barcodes) <- seq_len(N)
library(Biostrings)
tmp <- tempfile(fileext=".fastq")
writeXStringSet(DNAStringSet(barcodes), filepath=tmp, format="fastq")
# Counting the combinations.
countSingleBarcodes(tmp, choices=known.pool,
template="CGTACGNNNNNNNNNCCAGCTC")
## DataFrame with 4 rows and 2 columns
## choices counts
## <character> <integer>
## 1 AGAGAGAGA 270
## 2 CTCTCTCTC 224
## 3 GTGTGTGTG 262
## 4 CACACACAC 244
Make sure you're running the latest version (1.2.0) from Bioconductor.
Thank you so much! I'll check out if we have the latest version on our juypterhub.
One last question: is it possible to extract the read ID for each barcode?
Thank you for developing this.
Sent from Outlook for Androidhttps://aka.ms/AAb9ysg
From: Aaron Lun @.> Sent: Friday, January 19, 2024 1:41:46 am To: crisprVerse/screenCounter @.> Cc: williambakhache @.>; Author @.> Subject: Re: [crisprVerse/screenCounter] unknown base "N" error. (Issue #6)
Hm. I thought I fixed this in the last release cycle:
library(screenCounter)
known.pool <- c("AGAGAGAGA", "CTCTCTCTC", "GTGTGTGTG", "CACACACAC")
N <- 1000 barcodes <- sprintf("CAGCTANNCGTACG%sCCAGCTCGANNTCG", sample(known.pool, N, replace=TRUE)) names(barcodes) <- seq_len(N)
library(Biostrings) tmp <- tempfile(fileext=".fastq") writeXStringSet(DNAStringSet(barcodes), filepath=tmp, format="fastq")
countSingleBarcodes(tmp, choices=known.pool, template="CGTACGNNNNNNNNNCCAGCTC")
Make sure you're running the latest version (1.2.0) from Bioconductorhttps://bioconductor.org/packages/release/bioc/html/screenCounter.html.
— Reply to this email directly, view it on GitHubhttps://github.com/crisprVerse/screenCounter/issues/6#issuecomment-1899852265, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGZJIDRNXLGHCTYJRLKPCLLYPIISNAVCNFSM6AAAAABCBCVIFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJZHA2TEMRWGU. You are receiving this because you authored the thread.Message ID: @.***>
One last question: is it possible to extract the read ID for each barcode?
Currently not, it's all aggregated in the underlying C++ libraries.
I suppose we could report the read names associated with each barcode, but that could use an awful lot of memory for a deeply sequenced experiment. There may or may not be a better way to do what you actually want to do.
Thanks for your reply. For now I'm just using this package for doing quality check of my random barcode library.
In the future, I want to link a barcode with a certain genotype in that read.
Best wishes
William
From: Aaron Lun @.> Sent: Friday, January 19, 2024 5:55 PM To: crisprVerse/screenCounter @.> Cc: williambakhache @.>; Author @.> Subject: Re: [crisprVerse/screenCounter] unknown base "N" error. (Issue #6)
One last question: is it possible to extract the read ID for each barcode?
Currently not, it's all aggregated in the underlying C++ libraries.
I suppose we could report the read names associated with each barcode, but that could use an awful lot of memory for a deeply sequenced experiment. There may or may not be a better way to do what you actually want to do.
— Reply to this email directly, view it on GitHubhttps://github.com/crisprVerse/screenCounter/issues/6#issuecomment-1900673485, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AGZJIDRUOS2RPPFJ5VPUBALYPKJQ7AVCNFSM6AAAAABCBCVIFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMBQGY3TGNBYGU. You are receiving this because you authored the thread.Message ID: @.***>
Hello,
Just to let you know that this fixed it for me.
Works like a charm.
William
Ok, great.
As for the other question: when you have more clarity on the nature of the problem, make another issue and we can see what we can do. It may be possible to adapt the C++ code underlying the countCombinatorialBarcodes
function so that it captures the combination of genotype with a random barcode (assuming that we're dealing with a simple SNP).
Hello,
Wonderful tool, it's been working very well with some of my subseted fastq file.
I tried running it on my NGS data. However, I'm getting this error:
Error: BiocParallel errors 1 remote errors, element index: 1 0 unevaluated and other errors first remote error: Error in eval(expr, envir, enclos): unknown base 'N'
I'm thinking it has to do with some N bases in my sequences. Interestingly, a smaller fastq file with similar sequences work.
Let me know of any thoughts on how to fix this. Thanks for developing this tool.
William