Closed physnano closed 1 year ago
Hi Will,
To understand the issue, it would help me if you could share the bam file, the genome fasta file, the rf-count-genome command you ran, and the output rc file(s). If you can create a minimal case (for example by extracting a small region of the bam file and by processing it with the same rf-count-genome command), that's also ok.
One question in the meantime: when you say "empty" you mean literally empty? Or that the whole region has all the counts at 0? If the former, just share with me the output rc files and i'll see what's going on.
Thanks! Danny
rf-count-genome:
--processors 12 \
--count-mutations \
--fasta ${refPath} \
--primary-only \
--sorted \
--overwrite \
--min-quality 4 \
--median-quality 4 \
--count-mutations \
--only-mut "T2C" \
--output-dir ${outDir} \
${inPath}
The genome reference is GRCh38.p13.genome.fa
(https://www.gencodegenes.org/human/release_38.html)
I have shared a minimal .bam/.bami dataset directly. Re-running rf-count-genome on this minimal example seems to produce an .rc file of identical size suggesting the issue may actually be with rf-count-genome or perhaps some my passed options? This is nanopore data so suggestions as to the --min-quality and --median-quality parameters are greatly appreciated.
Minimal .bam files for Issue #41
On Sat, Aug 26, 2023 at 7:54 PM Danny Incarnato @.***> wrote:
Hi Will,
To understand the issue, it would help me if you could share the bam file, the genome fasta file, the rf-count-genome command you ran, and the output rc file(s). If you can create a minimal case (for example by extracting a small region of the bam file and by processing it with the same rf-count-genome command), that's also ok.
One question in the meantime: when you say "empty" you mean literally empty? Or that the whole region has all the counts at 0? If the former, just share with me the output rc files and i'll see what's going on.
Thanks! Danny
— Reply to this email directly, view it on GitHub https://github.com/dincarnato/RNAFramework/issues/41#issuecomment-1694555094, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACYFWLHGFYGYZFXB7MZJ4IDXXKZFHANCNFSM6AAAAAA374Y3FQ . You are receiving this because you authored the thread.Message ID: @.***>
Hi Will,
sorry I cannot see any attachment. Can you share the bam file in some other way, maybe via wetransfer.com? Concerning the size of the RC file, it has nothing to do with the size of the bam, but it has to do with the size of the genome.
Also, you didn't address my question: when you say "empty" you mean literally empty? Or that the whole region has all the counts at 0?
Sorry, I resent a minimal BAM file via google drive, let me know if you can see it. As for the extracted RC file I think its literally empty (I cannot even extract the region or see zeros)
Hi Will,
everything seems to work for me. See below:
[danny@rna-compute nanopore]$ rf-rctools extract -a file.bed BC04_align-sorted_7sk-ds.plus.rc -o int_anno.plus.rc -ow
[danny@rna-compute nanopore]$ rf-rctools view int_anno.plus.rc
chr6_52995621-52995948
ATGTGAGGGCGATCTGGCTGCGACATCTGTCACCCCATTGATCGCCAGGGTTGATTCGGCTGATCTGGCTGGCTAGGCGGGTGTCCCCTTCCTCCCTCACCGCTCCATGTGCGTCCCTCCCGAAGCTGCGCGCTCGGTCGAAGAGGACGACCATCCCCGATAGAGGAGGACCGGTCTTCGGTCAAGGGTATACGAGTAGCTGCGCTCCCCTGCTAGAACCTCCAAACAAGCTCTCAAGGTCCATTTGTAGGAGAACGTAGGGTAGTCAAGCTTCCAAGACTCCAGACACATCCAAATGAGGCGCTGCATGTGGCAGTCTGCCTTTCTT
0,102,20,106,32,106,51,25,67,101,50,50,56,48,161,216,89,131,115,162,153,67,92,63,62,71,101,230,93,72,38,123,75,75,111,96,103,295,86,40,74,82,44,94,135,182,253,98,80,83,501,206,39,75,138,288,229,140,93,110,203,44,53,50,31,94,56,65,168,96,79,95,117,247,219,176,107,137,706,137,134,144,86,91,63,87,268,61,78,103,30,42,99,67,42,23,56,11,72,56,303,112,39,76,28,45,99,151,62,141,124,91,57,54,36,35,32,50,28,59,441,52,79,89,49,71,106,77,78,47,53,49,34,205,168,119,111,388,66,65,87,89,30,124,54,38,56,74,68,97,83,66,74,149,33,40,242,579,86,120,115,114,44,115,52,34,47,45,46,40,69,148,110,83,146,54,168,101,81,335,79,66,39,147,258,178,170,81,260,99,181,193,151,103,89,51,754,212,69,109,178,97,158,101,69,112,76,90,312,98,191,163,135,183,151,43,64,88,37,35,67,40,67,54,92,106,54,85,84,72,73,84,71,122,46,149,140,68,71,85,35,37,69,129,83,102,65,164,218,30,45,49,46,98,113,73,113,156,210,88,67,82,259,209,136,220,68,132,147,275,114,143,142,69,100,88,78,38,45,30,105,72,265,125,63,56,72,69,113,57,71,51,69,36,44,46,224,66,47,34,42,91,68,53,150,64,152,59,215,70,101,185,69,100,159,81,77,29,109,55,76,108,222,41,23,10,10,1
4611,4681,4822,5059,5181,5198,5294,5353,5366,5374,5427,5434,5440,5447,5456,5481,5488,5492,5493,5508,5518,5546,5548,5550,5557,5558,5559,5559,5565,5569,5569,5570,5572,5577,5578,5579,5580,5581,5593,5594,5594,5595,5597,5598,5598,5598,5600,5665,5669,5674,5674,5680,5689,5691,5691,5693,5702,5709,5711,5712,5714,5724,5724,5724,5727,5735,5750,5750,5750,5763,5833,5839,5839,5841,5846,5856,5858,5863,5891,5897,5897,5901,5904,5910,5912,5919,5927,5933,5934,5937,5941,5944,5945,5946,5949,5951,5952,5954,5954,5954,5956,5964,5966,5968,5970,5972,5979,5980,5984,5988,6005,6006,6009,6010,6012,6014,6015,6015,6016,6016,6021,6022,6023,6024,6031,6031,6031,6035,6037,6041,6041,6045,6045,6045,6046,6052,6052,6052,6056,6160,6161,6165,6169,6169,6174,6176,6176,6177,6181,6183,6184,6186,6188,6189,6191,6191,6191,6194,6198,6198,6199,6207,6219,6220,6250,6251,6251,6259,6262,6262,6262,6263,6271,6271,6271,6273,6273,6274,6277,6298,6298,6298,6298,6299,6299,6324,6326,6326,6326,6336,6336,6337,6339,6339,6338,6337,6338,6338,6339,6336,6337,6337,6337,6338,6337,6337,6338,6339,6339,6339,6339,6340,6340,6339,6341,6342,6342,6342,6342,6342,6342,6342,6345,6345,6345,6344,6345,6345,6345,6345,6344,6344,6345,6346,6347,6347,6347,6351,6351,6352,6353,6354,6354,6352,6353,6353,6352,6352,6352,6352,6352,6352,6352,6352,6352,6352,6352,6350,6350,6348,6347,6345,6344,6344,6342,6345,6345,6345,6345,6342,6342,6342,6338,6336,6335,6335,6335,6335,6335,6334,6333,6331,6331,6327,6326,6327,6327,6325,6324,6324,6323,6323,6321,6312,6311,6310,6306,6306,6306,6304,6300,6292,6279,6275,6268,6262,6253,6243,6229,6221,6199,6170,6140,6125,6116,6091,6064,6045,6026,6003,5972,5912,5884,5850,5711,5286,5157,4805
I suspect that the issue is in the formatting of your BED file. 2 things to check:
dos2unix
Linux utility to convert line endingsPlease see attached a test BED file. Let me know if this works.
Best, Danny
Hi Danny,
Thanks, was definitely a BED file formatting issue. Question related to the output though: I ran rf-count-genome
with the --only_mut "T2C"
option, therefore shouldn't I only be seeing mutation counts at T/U locations within the int_anno.plus.rc file? Or am I misinterpreting the -count or view output?
Best,
Will
Yes Will, you are theoretically correct, however is your experiment stranded or not? I believe you are passing the bam files to rf-count-genome without specifying the strandedness. Is it fair to assume that the R1 will always have the same sequence as the RNA that originated it, and the R2 will always be the reverse complement?
If that is the case, you must specify the strandedness of your file as second-strand
. Please refer to the manual, and below:
String | Strandedness |
---|---|
:u or :unstranded | The information on the genomic strand that originated the transcript is not preserved |
:f or :first or :first-strand | Reads align to the strand complementary to the one that originated the transcript |
:s or :second or :second-strand | Reads align to the same strand that originated the transcript |
So, given the command line you specified before, your bam file should be specified as:
rf-count-genome \
--processors 12 \
--count-mutations \
--fasta ${refPath} \
--primary-only \
--sorted \
--overwrite \
--min-quality 4 \
--median-quality 4 \
--count-mutations \
--only-mut "T2C" \
--output-dir ${outDir} \
BC04_align-sorted_7sk-ds.bam:second
Alternatively, if the R1 is always the reverse-complement of the RNA and the R2 is always the RNA strand, you should use the first-strand
mode.
If your libraries are unstranded, the only way to do what you aim for is to perform the analysis transcriptome-level, and then to use the regular rf-count
.
Please let me know if this helps! All the best,
Danny
Thanks Danny for your help, this all makes sense now.
Best,
Will
Hi Danny,
I have used
rf-genome-count
to generate an .rc file (which seems to have been successful as the .rc file is fairly large ~25Gb).However when I try to extract a particular region with either a BED file or a GTF the extracted .rc file appears empty.
rf-rctools extract -a rna_targets.bed BC04_align-sorted.plus.rc -o int_anno.plus.rc -ow
This is further confirmed with the (empty) output from
rf-rctools view int_anno.plus.rc > int_anno.txt
.The BED file I am using is simply:
chr6 52995621 52995948
Is more information required in the BED file?
I definitely have reads in this region as I can confirm with visual inspection in IGV.
Best,
Will