caleblareau / mgatk

mgatk: mitochondrial genome analysis toolkit
http://caleblareau.github.io/mgatk
MIT License
98 stars 25 forks source link

chunk_barcoded_bam.py issue #29

Closed JohnApp-scRNA closed 3 years ago

JohnApp-scRNA commented 3 years ago

Hi there! When I try to run mgatk on our cluster, I get the following printed to the error file repeated around 20-30 times:

"/systempath..../chunk_barcoded_bam.py", line 59, in
faux_umi = barcode_id[0:16] + umi_id + fauxdon[(int(barcode_id[17:]) - 1)]
ValueError: invalid literal for int() with base 10: '

and then after the repeats of the above, the final lines of the error file say:

Error in checkGrep(grep(".A.txt", files)) :
Improper folder specification; file missing / extra file present. See documentation
Calls: importMito -> checkGrep
execution halter r: invalid literal for int() with base 10: '' invalid literal for int() with base 10: ''

Any help would be great. Thanks :)

caleblareau commented 3 years ago

What was the command that you executed as well as like the first 5 lines of the bam file? My best guess is that the barcode isn't following the 10x specification

On Oct 16, 2020 5:45 AM, JohnApp-scRNA notifications@github.com wrote:

Hi there! When I try to run mgatk on our cluster, I get the following printed to the error file repeated around 20-30 times:

"/systempath..../chunk_barcoded_bam.py", line 59, in faux_umi = barcode_id[0:16] + umi_id + fauxdon[(int(barcode_id[17:]) - 1)] ValueError: invalid literal for int() with base 10: '

and then after the repeats of the above, the final lines of the error file say:

Error in checkGrep(grep(".A.txt", files)) : Improper folder specification; file missing / extra file present. See documentation Calls: importMito -> checkGrep execution halter r: invalid literal for int() with base 10: '' invalid literal for int() with base 10: ''

Any help would be great. Thanks :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/mgatk/issues/29, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYMKVKAKZALRGMIV7UDSLA57JANCNFSM4STJLWPQ.

hj2017china commented 3 years ago

Hello professor, I also encountered the same problem! The error message of mgatk: Error in checkGrep(grep(".A.txt", files)) : Improper folder specification; file missing / extra file present. See documentation Calls: importMito -> checkGrep

my code: /home/han/CellRanger-ATAC/cellranger-atac-1.2.0/cellranger-atac count \ --fastqs fastq --id test1 --sample test1 \ --reference /home/han/CellRanger-ATAC/database/refdata-cellranger-atac-hg19-1.2.0 \ --localcores 12

mgatk tenx -i /home/han/test1/outs/possorted_bam.bam \ -n test1 -o test1_mgatk -c 12 \ -bt CB -b /home/han/test1/outs/filtered_peak_bc_matrix/barcodes.tsv

Looking forward to your reply! Thanks!

JohnApp-scRNA commented 3 years ago

Hi again. The data should be standard 10x with chemistry version 3. This is the first 5 lines of my bam file: SRR9990638.11729483 272 chr1 12023 0 91M 0 0 AGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCAT FJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF<JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFAF<A NH:i:8 HI:i:7 nM:i:0 AS:i:89 CR:Z:CCCTTAGCACTCCTGT UR:Z:TATCGCATTACA sS:Z:CCCTTAGCACTCCTGTTATCGCATTACA sQ:Z:AAFFFJJJJJJJJJJJJJJJJJJJJJJJ sM:i:0 CB:Z:CCCTTAGCACTCCTGT UB:Z:TATCGCATTACA SRR9990645.3944242 256 chr1 14117 0 91M 0 0 CCCAACACCAGCAATTGTGCCAAGGGCCATTAGGCTCTCAGCATGACTATTTTTAGAGACCCCGTGTCTGTCACTGAAACCTTTTTTGTGG <7-FFFJ--<AFJF<7-7-7-7J---7AA7FJFJ-FAFF7-77FF7---7F-7AFJJ-7F<<<J<<AJFFJJFFF<7F--A7-FJJJ-F<7 NH:i:5 HI:i:2 nM:i:0 AS:i:89 CR:Z:ACCTGTCTCAGGACGA UR:Z:CACTTTTATATC sS:Z:ACCTGTCTCAGGACGACACTTTTATATC sQ:Z:AAFFFJJFJJJJJJJJJJJJJFJJJJJJ sM:i:0 CB:Z:ACCTGTCTCAGGACGA UB:Z:CACTTTTATATC SRR9990645.3945299 256 chr1 14117 0 91M 0 0 CCCAACACCAGCAATTGTGCCAAGGGCCATTAGGCTCTCAGCATGACTATTTTTAGAGACCCCGTGTCTGTCACTGAAACCTTTTTTGTGG -<<FFJJFJJJ-AFAF-7--7AAJ-<AJ<FJJ--F<-A-A<-<FJFJFJF<-FJJF-AF-A<--<A7AFFJFJJJ-<JAJFFJFJJJJJFF NH:i:5 HI:i:2 nM:i:0 AS:i:89 CR:Z:ACCTGTCTCAGGACGA UR:Z:CACTTTTATATC sS:Z:ACCTGTCTCAGGACGACACTTTTATATC sQ:Z:AAFFFJJJJJJJJJJJJJJJJJJJJJJJ sM:i:0 CB:Z:ACCTGTCTCAGGACGA UB:Z:CACTTTTATATC SRR9990646.5071207 0 chr1 14249 0 91M 0 0 GCCCTTCTCTCCTCCCTCTCATCCCAGAGAAACAGGTCAGCTGGGAGCTTCTGCCCCCACTGCCTAGGGACCAACAGGGGCAGGAGGCAGT <<<F-FFFFFJJFJFAJFJJJJJJJJJ<JJFJJJJ7JAFJJJJJJFFJJJJJJJFJJJAFJJJJFJJFJJJJFJJJFAJJJJJJJJFJFJJ NH:i:5 HI:i:1 nM:i:0 AS:i:89 CR:Z:ACCTGTCTCAGGACGA UR:Z:CACTTTTATATC sS:Z:ACCTGTCTCAGGACGACACTTTTATATC sQ:Z:AAFFFJJJJJJJJJJJJJJJJJJJJJJJ sM:i:0 CB:Z:ACCTGTCTCAGGACGA UB:Z:CACTTTTATATC SRR9990636.6938383 16 chr1 14356 1 46S105M * 0 0 TTTGTTTTTTTTTTTTGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATCCTGCACAGCTAGAGATCCTTTATTAAAAGCACACTGTTGGTTTCTGCTCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAA 7-----<------7-7-A--<-A<7FF7FAAJJAAFFFAFFAA7A<-A7AA--<<-F7--FFA7FJAF7-FA-A7-7A777<FAA7-<7A7AJJJFFA7A7-F<F-<7-<JFFFFA7-JFFA7F<<F7-7<J<FFFFFAFF7FAAF-<<-- NH:i:4 HI:i:1 nM:i:2 AS:i:99 CR:Z:GTAGGTTGTCTTGAAC UR:Z:CTTCGCTAGGGT sS:Z:GTAGGTTGTCTTGAACCTTCGCTAGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCCTCCTCCTTCCCTCCCCCCTCCCCCCTTTCCCTTCTCTTTCGCCCCCCCTCTTCCTTTTTTCG sQ:Z:AAFFFFJJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJAFAJJJJJJJ<AJJJJ<JJJJA<AA--<7-<<-------))----<----7----7A)-))-----------<-----A77-77A---7-----7- sM:i:0 CB:Z:GTAGGTTGTCTTGAAC UB:Z:CTTCGCTAGGGT

And the command that I ran was:

mgatk tenx -i $filepath/Aligned.sortedByCoord.out.bam -n Voigt -o $THEBAM/Voigt_mgatk -c 16 -ub UB -bt CB -b "$BARCODES"

Hope this is informative. Thanks, John

caleblareau commented 3 years ago

The CB doesn't have the -1 convention used to delineate the channel analyzed and is required for some downstream logic.

On Oct 28, 2020 3:58 AM, JohnApp-scRNA notifications@github.com wrote:

Hi again. The data should be standard 10x with chemistry version 3. This is the first 5 lines of my bam file: SRR9990638.11729483 272 chr1 12023 0 91M 0 0 AGCAACTGCTGGCCTGTGCCAGGGTGCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCAT FJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJF<JJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFAF<A NH:i:8 HI:i:7 nM:i:0 AS:i:89 CR:Z:CCCTTAGCACTCCTGT UR:Z:TATCGCATTACA sS:Z:CCCTTAGCACTCCTGTTATCGCATTACA sQ:Z:AAFFFJJJJJJJJJJJJJJJJJJJJJJJ sM:i:0 CB:Z:CCCTTAGCACTCCTGT UB:Z:TATCGCATTACA SRR9990645.3944242 256 chr1 14117 0 91M 0 0 CCCAACACCAGCAATTGTGCCAAGGGCCATTAGGCTCTCAGCATGACTATTTTTAGAGACCCCGTGTCTGTCACTGAAACCTTTTTTGTGG <7-FFFJ--<AFJF<7-7-7-7J---7AA7FJFJ-FAFF7-77FF7---7F-7AFJJ-7F<<<J<<AJFFJJFFF<7F--A7-FJJJ-F<7 NH:i:5 HI:i:2 nM:i:0 AS:i:89 CR:Z:ACCTGTCTCAGGACGA UR:Z:CACTTTTATATC sS:Z:ACCTGTCTCAGGACGACACTTTTATATC sQ:Z:AAFFFJJFJJJJJJJJJJJJJFJJJJJJ sM:i:0 CB:Z:ACCTGTCTCAGGACGA UB:Z:CACTTTTATATC SRR9990645.3945299 256 chr1 14117 0 91M 0 0 CCCAACACCAGCAATTGTGCCAAGGGCCATTAGGCTCTCAGCATGACTATTTTTAGAGACCCCGTGTCTGTCACTGAAACCTTTTTTGTGG -<<FFJJFJJJ-AFAF-7--7AAJ-<AJ<FJJ--F<-A-A<-<FJFJFJF<-FJJF-AF-A<--<A7AFFJFJJJ-<JAJFFJFJJJJJFF NH:i:5 HI:i:2 nM:i:0 AS:i:89 CR:Z:ACCTGTCTCAGGACGA UR:Z:CACTTTTATATC sS:Z:ACCTGTCTCAGGACGACACTTTTATATC sQ:Z:AAFFFJJJJJJJJJJJJJJJJJJJJJJJ sM:i:0 CB:Z:ACCTGTCTCAGGACGA UB:Z:CACTTTTATATC SRR9990646.5071207 0 chr1 14249 0 91M 0 0 GCCCTTCTCTCCTCCCTCTCATCCCAGAGAAACAGGTCAGCTGGGAGCTTCTGCCCCCACTGCCTAGGGACCAACAGGGGCAGGAGGCAGT <<<F-FFFFFJJFJFAJFJJJJJJJJJ<JJFJJJJ7JAFJJJJJJFFJJJJJJJFJJJAFJJJJFJJFJJJJFJJJFAJJJJJJJJFJFJJ NH:i:5 HI:i:1 nM:i:0 AS:i:89 CR:Z:ACCTGTCTCAGGACGA UR:Z:CACTTTTATATC sS:Z:ACCTGTCTCAGGACGACACTTTTATATC sQ:Z:AAFFFJJJJJJJJJJJJJJJJJJJJJJJ sM:i:0 CB:Z:ACCTGTCTCAGGACGA UB:Z:CACTTTTATATC SRR9990636.6938383 16 chr1 14356 1 46S105M * 0 0 TTTGTTTTTTTTTTTTGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATCCTGCACAGCTAGAGATCCTTTATTAAAAGCACACTGTTGGTTTCTGCTCAGTTCTTTATTGATTGGTGTGCCGTTTTCTCTGGAAGCCTCTTAAGAA 7-----<------7-7-A--<-A<7FF7FAAJJAAFFFAFFAA7A<-A7AA--<<-F7--FFA7FJAF7-FA-A7-7A777<FAA7-<7A7AJJJFFA7A7-F<F-<7-<JFFFFA7-JFFA7F<<F7-7<J<FFFFFAFF7FAAF-<<-- NH:i:4 HI:i:1 nM:i:2 AS:i:99 CR:Z:GTAGGTTGTCTTGAAC UR:Z:CTTCGCTAGGGT sS:Z:GTAGGTTGTCTTGAACCTTCGCTAGGGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTCCTCCTCCTTCCCTCCCCCCTCCCCCCTTTCCCTTCTCTTTCGCCCCCCCTCTTCCTTTTTTCG sQ:Z:AAFFFFJJJJJJJJJJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJAFAJJJJJJJ<AJJJJ<JJJJA<AA--<7-<<-------))----<----7----7A)-))-----------<-----A77-77A---7-----7- sM:i:0 CB:Z:GTAGGTTGTCTTGAAC UB:Z:CTTCGCTAGGGT

And the command that I ran was:

mgatk tenx -i $filepath/Aligned.sortedByCoord.out.bam -n Voigt -o $THEBAM/Voigt_mgatk -c 16 -ub UB -bt CB -b "$BARCODES"

Hope this is informative. Thanks, John

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/mgatk/issues/29#issuecomment-717856026, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYKKZGURHWQBQKNJUJ3SM72MDANCNFSM4STJLWPQ.

JohnApp-scRNA commented 3 years ago

Ah, thanks for clarifying. We used STARsolo to perform alignment which discards lane information. Is there a possible work around the allow mgatk to work in with STARsolo in this way?

caleblareau commented 3 years ago

I can look into it, but currently, your best option would be to use bcall , which will be slightly less computationally efficient (since it’ll create intermediate files per single cell) but it will work right away with your use case.

On Oct 28, 2020, at 11:57 AM, JohnApp-scRNA notifications@github.com<mailto:notifications@github.com> wrote:

Ah, thanks for clarifying. We used STARsolo to perform alignment which discards lane information. Is there a possible work around the allow mgatk to work in with STARsolo in this way?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/mgatk/issues/29#issuecomment-718141932, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYLDNDB327WXK5VBYTTSNBSSTANCNFSM4STJLWPQ.