Open lpantano opened 5 years ago
The cDNA_introns.t2g.txt
file provided in the tutorial does have both intons and exons in the same txt file. You can take a look at the beginning and end of the file by doing:
$ wget https://github.com/BUStools/getting_started/releases/download/velocity_tutorial/cDNA_introns.t2g.txt.gz
$ gunzip cDNA_introns.t2g.txt.gz
$ head cDNA_introns.t2g.txt
ENST00000456328.1 ENSG00000223972 DDX11L1
ENST00000450305.2 ENSG00000223972 DDX11L1
ENST00000488147.3 ENSG00000227232 WASH7P
ENST00000619216.4 ENSG00000278267 MIR6859-1
ENST00000473358.5 ENSG00000243485 MIR1302-2HG
ENST00000469289.6 ENSG00000243485 MIR1302-2HG
ENST00000607096.7 ENSG00000284332 MIR1302-2
ENST00000417324.8 ENSG00000237613 FAM138A
ENST00000461467.9 ENSG00000237613 FAM138A
ENST00000606857.10 ENSG00000268020 OR4G4P
$ tail cDNA_introns.t2g.txt
ENST00000463325.1056070-I ENSG00000184319 RPL23AP82
ENST00000463325.1056071-I ENSG00000184319 RPL23AP82
ENST00000494075.1056072-I ENSG00000184319 RPL23AP82
ENST00000494075.1056073-I ENSG00000184319 RPL23AP82
ENST00000494075.1056074-I ENSG00000184319 RPL23AP82
ENST00000423888.1056075-I ENSG00000184319 RPL23AP82
ENST00000423888.1056076-I ENSG00000184319 RPL23AP82
ENST00000423888.1056077-I ENSG00000184319 RPL23AP82
ENST00000480246.1056078-I ENSG00000184319 RPL23AP82
ENST00000480246.1056079-I ENSG00000184319 RPL23AP82
Thanks, so this command should be ok?:
bustools count -o genes/ --genecounts -g cDNA_introns.t2g.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus
Yes.
Just so you know, there is small issue right now in that bustools will not create the subfolder if you give it a multi folder output (e.g. if you want the output on ./sample2/genecounts
) so you need to make sure the folder exist in that case. If you'd like to do that, this is how I'd make sure the folder gets created prior to running the command:
mkdir -p ./sample2/genecounts
bustools count \
--output ./sample2/genecounts/genes \
--genecounts \
--genemap /references/homo_sapiens-ensembl-96/transcripts_to_genes.txt \
--ecmap ./sample2/matrix.ec \
--txnames ./sample2/transcripts.txt \
./sample2/output.corrected.sorted.bus
Thank you!
Sadly I am having a segmentation fault (with the conda and source code compiled by myself) only for that command, this is the debug output, I don't know if that helps.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x000000000043f9e0 in std::vector<int, std::allocator<int> >::begin (this=0x2aabd9af73f8) at /usr/include/c++/4.8.2/bits/stl_vector.h:548
548 { return const_iterator(this->_M_impl._M_start); }
(gdb) bt
#0 0x000000000043f9e0 in std::vector<int, std::allocator<int> >::begin (this=0x2aabd9af73f8) at /usr/include/c++/4.8.2/bits/stl_vector.h:548
#1 0x000000000045cb53 in intersect_genes_of_ecs (ecs=std::vector of length 1, capacity 100 = {...}, ec2genes=std::vector of length 12317869, capacity 16777216 = {...},
glist=std::vector of length 0, capacity 100) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/Common.cpp:174
#2 0x0000000000450ddb in __lambda1::operator() (__closure=0x7fffffffbdb0, v=std::vector of length 6345915, capacity 6400000 = {...})
at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_count.cpp:177
#3 0x0000000000451a87 in bustools_count (opt=...) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_count.cpp:241
#4 0x000000000043dd35 in main (argc=12, argv=0x7fffffffd868) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_main.cpp:1072
The output folder has an empty MTX file.
the other count
commands work, so is not this step specifically.
I will try the test data to make sure is not my sample, but I really want to have this working with my data, let me know if I can give you any more information.
If it is the cpp library version, can you tell me the one you have that is working so I can try to compile versus that particular version?
thanks!
Please make sure that all of the files you're giving bustools exist (that the path is correct) and that the bus file is not empty.
If the issue persists please try again with test data.
Every time we've seen seg faults its because either we were giving it empty files, or paths to files that didn't exist.
On Tue, Jul 23, 2019, 10:35 Lorena Pantano notifications@github.com wrote:
Thank you!
Sadly I am having a segmentation fault (with the conda and source code compiled by myself) only for that command, this is the debug output, I don't know if that helps.
[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault. 0x000000000043f9e0 in std::vector<int, std::allocator
>::begin (this=0x2aabd9af73f8) at /usr/include/c++/4.8.2/bits/stl_vector.h:548 548 { return const_iterator(this->_M_impl._M_start); } (gdb) bt 0 0x000000000043f9e0 in std::vector<int, std::allocator
>::begin (this=0x2aabd9af73f8) at /usr/include/c++/4.8.2/bits/stl_vector.h:548 1 0x000000000045cb53 in intersect_genes_of_ecs (ecs=std::vector of length 1, capacity 100 = {...}, ec2genes=std::vector of length 12317869, capacity 16777216 = {...},
glist=std::vector of length 0, capacity 100) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/Common.cpp:174
2 0x0000000000450ddb in lambda1::operator() (closure=0x7fffffffbdb0, v=std::vector of length 6345915, capacity 6400000 = {...})
at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_count.cpp:177
3 0x0000000000451a87 in bustools_count (opt=...) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_count.cpp:241
4 0x000000000043dd35 in main (argc=12, argv=0x7fffffffd868) at /home/lpantano/om2-pilm/conda3/envs/bustools/share/bustools/src/bustools_main.cpp:1072
The output folder has an empty MTX file.
the other count commands work, so is not this step specifically.
I will try the test data to make sure is not my sample, but I really want to have this working with my data, let me know if I can give you any more information.
If it is the cpp library version, can you tell me the one you have that is working so I can try to compile versus that particular version?
thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BUStools/bustools/issues/11?email_source=notifications&email_token=AC7MY4FFG6TOVIQZLZMFIHLQA46MFA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2T33AQ#issuecomment-514309506, or mute the thread https://github.com/notifications/unsubscribe-auth/AC7MY4DY2GYRRHUCAWFHV73QA46MFANCNFSM4IFLDGNQ .
Thank you.
files exist and they are not empty:
4.7G matrix.ec
1.5G output.correct.sort.bus
30M transcripts.txt
58M cDNA_introns.t2g.txt
output folder exists. this is the command I used:
bustools count -o genes/genes --genecounts -g cDNA_introns.t2g.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus
it takes a while to give the error, like 20 min or so. There is an empty file at genes/genes.mtx
.
I tried the with the example data: bus_output_06
and the same error appeared. I used only a pair of files when aligning to speed up computation:
06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL
73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz
I tried in a different cluster machines, and the same seg.fault appeared:
Program received signal SIGSEGV, Segmentation fault.
0x000055555557682b in intersect_genes_of_ecs(std::vector<int, std::allocator<int> > const&, std::vector<std::vector<int, std::allocator<int> >, std::allocator<std::vector<int, std::allocator<int> > > > const&, std::vector<int, std::allocator<int> >&) ()
I know it is a difficult error to catch up, I am happy to help in any way possible.
cheers
If you can provide all files and the bustools binary you are using we will try to reproduce it and identify a fix
On Wed, 24 Jul 2019 at 08:25, Lorena Pantano notifications@github.com wrote:
Thank you.
files exist and they are not empty:
4.7G matrix.ec 1.5G output.correct.sort.bus 30M transcripts.txt 58M cDNA_introns.t2g.txt
output folder exists. this is the command I used:
bustools count -o genes/genes --genecounts -g cDNA_introns.t2g.txt -e matrix.ec -t transcripts.txt output.correct.sort.bus
it takes a while to give the error, like 20 min or so. There is an empty file at genes/genes.mtx.
I tried the with the example data: bus_output_06 and the same error appeared. I used only a pair of files when aligning to speed up computation:
06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL 73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz
I tried in a different cluster machines, and the same seg.fault appeared:
Program received signal SIGSEGV, Segmentation fault. 0x000055555557682b in intersect_genes_of_ecs(std::vector<int, std::allocator
> const&, std::vector<std::vector<int, std::allocator >, std::allocator<std::vector<int, std::allocator > > > const&, std::vector<int, std::allocator >&) () I know it is a difficult error to catch up, I am happy to help in any way possible.
cheers
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BUStools/bustools/issues/11?email_source=notifications&email_token=AC7MY4BJM7TATBH7JEQLIUTQBBX5XA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2WWHRI#issuecomment-514679749, or mute the thread https://github.com/notifications/unsubscribe-auth/AC7MY4AWYB3XD3GP4QWLITDQBBX5XANCNFSM4IFLDGNQ .
Thank you! Here are the files:
wget https://lpantano-debug.s3.amazonaws.com/bustools/bustools
wget https://lpantano-debug.s3.amazonaws.com/bustools/cDNA_introns.t2g.txt
wget https://lpantano-debug.s3.amazonaws.com/bustools/matrix.ec
wget https://lpantano-debug.s3.amazonaws.com/bustools/output.correct.sort.bus
wget https://lpantano-debug.s3.amazonaws.com/bustools/transcripts.txt
I installed bustools
with conda
.
thanks so much
How did you generate this bus file? I converted the 2.6GB bus file you provided into text to inspect it and it expanded to 205GB.
Each entry that should correspond to a UMI has a length of 2573 charachters, and it repeats every 32 characters. I have attached the first 5 lines of the file for you to understand.
If you can provide the detailed steps you used to generate this file we can look further into the issue.
This is for the the velocity tutorial, I just took on of the samples:
kallisto bus -i cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4
06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz
06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz
bustools correct -w ../../10xv2_whitelist.txt -p output.bus | bustools sort
-o output.correct.sort.bus -t 4 -
Is this the information you need? I just followed the steps in the tutorial.
Thanks for helping with this.
On July 26, 2019 at 1:10:45 AM, Eduardo Beltrame (notifications@github.com) wrote:
How did you generate this bus file? I converted the 2.6GB bus file you provided into text to inspect it and it expanded to 205GB.
Each entry that should correspond to a UMI has a length of 2573 charachters, and it repeats every 32 characters. I have attached the first 5 lines of the file for you to understand.
If you can provide the detailed steps you used to generate this file we can look further into the issue.
nuclear_rna_first_5_lines.txt https://github.com/BUStools/bustools/files/3434254/nuclear_rna_first_5_lines.txt
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BUStools/bustools/issues/11?email_source=notifications&email_token=AAML6HHNZUSIGRUK2DXKBALQBKBNLA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD23QNCI#issuecomment-515311241, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML6HAZCKJOS4D5TBC546TQBKBNLANCNFSM4IFLDGNQ .
The fastq files were done by:
bamtofastq --reads-per-fastq=500000000 10X_17_029.bam ./06
Can I convert the BUS file to text? I have my sample as well and I could see if that happens with my sample as well. Or I can give it to your.
Thanks
On July 26, 2019 at 9:00:29 AM, Lorena Pantano (lorena.pantano@gmail.com) wrote:
This is for the the velocity tutorial, I just took on of the samples:
kallisto bus -i cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4
06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz
06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz
bustools correct -w ../../10xv2_whitelist.txt -p output.bus | bustools sort
-o output.correct.sort.bus -t 4 -
Is this the information you need? I just followed the steps in the tutorial.
Thanks for helping with this.
On July 26, 2019 at 1:10:45 AM, Eduardo Beltrame (notifications@github.com) wrote:
How did you generate this bus file? I converted the 2.6GB bus file you provided into text to inspect it and it expanded to 205GB.
Each entry that should correspond to a UMI has a length of 2573 charachters, and it repeats every 32 characters. I have attached the first 5 lines of the file for you to understand.
If you can provide the detailed steps you used to generate this file we can look further into the issue.
nuclear_rna_first_5_lines.txt https://github.com/BUStools/bustools/files/3434254/nuclear_rna_first_5_lines.txt
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/BUStools/bustools/issues/11?email_source=notifications&email_token=AAML6HHNZUSIGRUK2DXKBALQBKBNLA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD23QNCI#issuecomment-515311241, or mute the thread https://github.com/notifications/unsubscribe-auth/AAML6HAZCKJOS4D5TBC546TQBKBNLANCNFSM4IFLDGNQ .
Yes you can convert it to text by doing bustools text -o bus_output_filename.txt bus_input_file.bus
On Fri, 26 Jul 2019 at 06:07, Lorena Pantano notifications@github.com wrote:
The fastq files were done by:
bamtofastq --reads-per-fastq=500000000 10X_17_029.bam ./06
Can I convert the BUS file to text? I have my sample as well and I could see if that happens with my sample as well. Or I can give it to your.
Thanks
On July 26, 2019 at 9:00:29 AM, Lorena Pantano (lorena.pantano@gmail.com) wrote:
This is for the the velocity tutorial, I just took on of the samples:
kallisto bus -i cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz bustools correct -w ../../10xv2_whitelist.txt -p output.bus | bustools sort -o output.correct.sort.bus -t 4 -
Is this the information you need? I just followed the steps in the tutorial.
Thanks for helping with this.
On July 26, 2019 at 1:10:45 AM, Eduardo Beltrame (notifications@github.com ) wrote:
How did you generate this bus file? I converted the 2.6GB bus file you provided into text to inspect it and it expanded to 205GB.
Each entry that should correspond to a UMI has a length of 2573 charachters, and it repeats every 32 characters. I have attached the first 5 lines of the file for you to understand.
If you can provide the detailed steps you used to generate this file we can look further into the issue.
nuclear_rna_first_5_lines.txt < https://github.com/BUStools/bustools/files/3434254/nuclear_rna_first_5_lines.txt
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/BUStools/bustools/issues/11?email_source=notifications&email_token=AAML6HHNZUSIGRUK2DXKBALQBKBNLA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD23QNCI#issuecomment-515311241
, or mute the thread < https://github.com/notifications/unsubscribe-auth/AAML6HAZCKJOS4D5TBC546TQBKBNLANCNFSM4IFLDGNQ
.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BUStools/bustools/issues/11?email_source=notifications&email_token=AC7MY4GLJ364NNXCRVAZGV3QBLZH7A5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD24QZHQ#issuecomment-515443870, or mute the thread https://github.com/notifications/unsubscribe-auth/AC7MY4FGUUXK7SSKSJUVNUTQBLZH7ANCNFSM4IFLDGNQ .
yes, the bus file from my sample looks the same than that. Used the same command I posted before to produce the bus file, with kallisto.
Ok, can you confirm that the fastq files you generated using
bamtofastq
look correct? If you can paste here the output of head it
would be good.
If the fastqs look correct then it might have been an issue with the files
given to kallisto, if you can also post the commands you use to call
kallisto bus
and the files used that might solve the mystery
On Fri, 26 Jul 2019 at 10:57, Lorena Pantano notifications@github.com wrote:
yes, the bus file from my sample looks the same than that. Used the same command I posted before to produce the bus file, with kallisto.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BUStools/bustools/issues/11?email_source=notifications&email_token=AC7MY4F7TQT46GJOG52FOSTQBM3H7A5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD25JQGI#issuecomment-515545113, or mute the thread https://github.com/notifications/unsubscribe-auth/AC7MY4CBIETH6LOXRGEKXA3QBM3H7ANCNFSM4IFLDGNQ .
Sure.
These are the commands to create the bus file from the bam files:
bamtofastq --reads-per-fastq=500000000 10X_17_029.bam ./06
kallisto bus -i cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz
bustools correct -w ../../10xv2_whitelist.txt -p output.bus | bustools sort -o output.correct.sort.bus -t 4 -
The other sample I have is not generated with bamtofastq, they were generated with cellranger mkfasta
and the bus file is the same.
R1:
@D00456:228:HL73JBCXY:2:1111:15821:53600 1:N:0:0
CTAGCCCTAACCCTAACCCTAACCCT
+
.GAGA<GGGGAGGGGGGGGAAA.A<G
@D00456:228:HL73JBCXY:2:1213:6403:92501 1:N:0:0
AACCCTAACCCTAACCCTAACCCTAA
+
GGGGAGGGGGAGGAGGGGGG..AGAG
@D00456:228:HL73JBCXY:2:2106:9266:5974 1:N:0:0
ACGTCAAAGCAGCCTCCACAATGTGG
R2:
@D00456:228:HL73JBCXY:2:1111:15821:53600 3:N:0:0
GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGCTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGATAGGGTTCGGTCTAGGGATAGG
+
AGGAAA..GG.<<.G<A.G<G..<G<G<GAGG..<AGAG<..<<A..<<GG.<<GGA<..<<<..<...<..<<.<<AA...<...<.<...<<....
@D00456:228:HL73JBCXY:2:1213:6403:92501 3:N:0:0
GGTTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGAGTAGGGATGGGGAGCGGGGAGGGGAGAGG
+
GA<G.AAA<GGGAG<AAGGGI<..<GGGGGAAAG..<.<.<.<AGAG.GGG.<.GAG..G<<<...<<....<<......<.<.<.......<..<..
@D00456:228:HL73JBCXY:2:2106:9266:5974 3:N:0:0
GCTAAGAGACAGCAAATACACATGAACAGAAAGAAGAGGTCAAAGAAAAGGCTGACGGCAAGTTAACGAAAAGAAAAATGGTGAATGATACCCGGTGC
In a last note, the velocity tutorial works if I use bustools capture
and bustools count
to get the two type of matrices. This is only happening when I want just to do bustools count
with the matrix.ec
file firectly since I want to quantify all reads into genes. (https://www.kallistobus.tools/velocity_tutorial.html)
thanks!
Thanks, this all looks good. One last question - are you by any chance running this on a 32 bit machine?
On Fri, Jul 26, 2019, 13:16 Lorena Pantano notifications@github.com wrote:
Sure.
These are the commands to create the bus file from the bam files:
bamtofastq --reads-per-fastq=500000000 10X_17_029.bam ./06
kallisto bus -i cDNA_introns.idx -o bus_output_06/ -x 10xv2 -t 4 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R1_001.fastq.gz 06/10X_17_029_MissingLibrary_1_HL73JBCXY/bamtofastq_S1_L002_R2_001.fastq.gz
bustools correct -w ../../10xv2_whitelist.txt -p output.bus | bustools sort -o output.correct.sort.bus -t 4 -
The other sample I have is not generated with bamtofastq, they were generated with cellranger mkfasta and the bus file is the same.
R1:
@D00456:228:HL73JBCXY:2:1111:15821:53600 1:N:0:0 CTAGCCCTAACCCTAACCCTAACCCT + .GAGA<GGGGAGGGGGGGGAAA.A<G @D00456:228:HL73JBCXY:2:1213:6403:92501 1:N:0:0 AACCCTAACCCTAACCCTAACCCTAA + GGGGAGGGGGAGGAGGGGGG..AGAG @D00456:228:HL73JBCXY:2:2106:9266:5974 1:N:0:0 ACGTCAAAGCAGCCTCCACAATGTGG
R2:
@D00456:228:HL73JBCXY:2:1111:15821:53600 3:N:0:0 GGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGCTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTGGGGATAGGGTTCGGTCTAGGGATAGG + AGGAAA..GG.<<.G<A.G<G..<G<G<GAGG..<AGAG<..<<A..<<GG.<<GGA<..<<<..<...<..<<.<<AA...<...<.<...<<.... @D00456:228:HL73JBCXY:2:1213:6403:92501 3:N:0:0 GGTTAGGGTTAGGGTTAGGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGAGTAGGGATGGGGAGCGGGGAGGGGAGAGG + GA<G.AAA<GGGAG<AAGGGI<..<GGGGGAAAG..<.<.<.<AGAG.GGG.<.GAG..G<<<...<<....<<......<.<.<.......<..<.. @D00456:228:HL73JBCXY:2:2106:9266:5974 3:N:0:0 GCTAAGAGACAGCAAATACACATGAACAGAAAGAAGAGGTCAAAGAAAAGGCTGACGGCAAGTTAACGAAAAGAAAAATGGTGAATGATACCCGGTGC
In a last note, the velocity tutorial works if I use bustools capture and bustools count to get the two type of matrices. This is only happening when I want just to do bustools count with the matrix.ec file firectly since I want to quantify all reads into genes. ( https://www.kallistobus.tools/velocity_tutorial.html)
thanks!
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BUStools/bustools/issues/11?email_source=notifications&email_token=AC7MY4BGCD4VK3OFTQIRICLQBNLRDA5CNFSM4IFLDGN2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD25TMEY#issuecomment-515585555, or mute the thread https://github.com/notifications/unsubscribe-auth/AC7MY4EK3CRRD5ULBB7ELXDQBNLRDANCNFSM4IFLDGNQ .
thank you for confirming.
nope, 64 linux machines.
thanks!
Ok we're very puzzled as to how the bus file with the 2573 characters in the UMI have been made, and ran out of ideas.
Can you subsample 100k reads from R1 and R2 so that we can try to reproduce the problematic bus file? seqtk is an easy way to subsample them: https://github.com/lh3/seqtk
Thanks. I’ll do that on Monday.
I checked the BUS file from kallisto and looks like this:
CTGTTTACATACGCTA TAATAAGGAA 1746067 1 CTTAACTAGCTCTCGG ATCGTCCTGG 1746067 1 CTTTGCGAGCGATGAC TCCAGGGTGA 1746067 1 CTTTGCGGTACTCGCG TTAAATCGTG 1746067 1 CTTTGCGGTGTTCGAT CCCGCACGGT 1746067 1 CTTTGCGGTGTTCGAT CCCGCACGGT 1746067 1 GAAGCAGAGGCTCATT TGTTTTGCAC 1746067 1 GAAGCAGAGGCTCATT GTCTTTGATG 1746067 1 GAATGAACATGCTAGT AAATGAACAA 1746067 1 GACAGAGCAATCGAAA CACGGATGTC 1746067 1
But after bustools correct and sort it looks like this (it seems the problem is the correct | sort step):
AAAAAAAAAAAAAAAA AAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAA AAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAA AAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACC AAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAA AAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAA CAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAA AAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAA AAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAA AACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAA AAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAA AAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACC AAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAA AAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCAAAAAAAAAAAAAAAAAAAAAAAACAAAAACCA 0 3275751424
Hi,
I uploaded the BUS file before sorting and correction that it seems right. Do you still want the FASTQ files?
https://lpantano-debug.s3.amazonaws.com/bustools/output.bus
As I posted last week, the weird bus appears after the correct command.
Thanks!
Hi,
I am trying the tool for the quantification of nuclear RNA, after my recent tweets with Lior Patcher.
Every commands in the Velocity tutorial seems to work, but I want only the quantification by gene taking into account reads in introns and exons, so I am really don't want the outputs in the velocity tutorial.
I imagine that something like this is the solution after doing
correct
andsort
:but I want to make sure that t2g.txt is the correct. I noticed the annotation for introns is
TRANSCRIPT.N-I
meanwhile the exons isTRANSCRIPT.N
. If thet2g.txt
file is lie this:It would recognized introns and exons, or I need to add the intron annotation to the file?
Thanks!