Closed JBreunig closed 1 year ago
Hi @JBreunig,
can you share the command used, the first lines of your TE BED file (using head
) and the output of ls -Rlht
on the results folder?
Here you go...thanks again:
$$head Mm10TEannotation.bed chr1 3000001 3002128 chr1|3000001|3002128|L1_Mus3:L1:LINE|- 12955 - chr1 3003153 3003994 chr1|3003153|3003994|L1Md_F:L1:LINE|- 1216 - chr1 3003994 3004054 chr1|3003994|3004054|L1_Mus3:L1:LINE|- 234 - chr1 3004041 3004206 chr1|3004041|3004206|L1_Rod:L1:LINE|+ 3685 + chr1 3004271 3005001 chr1|3004271|3005001|L1_Rod:L1:LINE|+ 3685 + chr1 3005002 3005439 chr1|3005002|3005439|L1_Rod:L1:LINE|+ 1280 + chr1 3005461 3005548 chr1|3005461|3005548|Lx9:L1:LINE|+ 4853 + chr1 3005571 3006764 chr1|3005571|3006764|Lx9:L1:LINE|+ 4853 + chr1 3007015 3007268 chr1|3007015|3007268|L1M4:L1:LINE|- 438 - chr1 3008117 3008483 chr1|3008117|3008483|L1_Mur2:L1:LINE|- 1590 -
$ls -Rlht .: total 61G -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 0 Oct 11 20:45 Tester_allcounts_final.txt -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 0 Oct 11 20:45 Tester_subftes_2.txt -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 0 Oct 11 20:45 Tester_genes_2.txt -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 0 Oct 11 20:45 Tester_locustes_2.txt -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 0 Oct 11 20:45 Tester_genes.txt -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 0 Oct 11 20:45 Tester_locustes.txt -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 0 Oct 11 20:45 Tester_subftes.txt -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 5.7M Oct 11 20:45 Tester_final.bam.bai -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 16G Oct 11 20:41 Tester_final.bam -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 16G Oct 11 19:59 Tester_full_sorted.bam -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 16G Oct 11 19:54 Tester_full.bam -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 2.1K Oct 11 19:53 Tester_teannotated.bam -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 0 Oct 11 19:53 Tester_selectedtes.bed -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 592 Oct 11 19:53 Tester_nogenes_overlappingtes.bam.bai -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 0 Oct 11 19:53 Tester_nogenes_overlappingtes.bed -rw-rw-r-- 1 jjbtr32643970xadmin jjbtr32643970xadmin 2.1K Oct 11 19:53 Tester_nogenes_overlappingtes.bam
Can you share the first lines of the output of samtools view
on your BAM file?
Here you go:
samtools view Aligned.sortedByCoord.out.bam | head A00319:434:H27L5DRX2:1:2205:32190:11898 16 1 3001665 255 90M 0 0 ATATTGTGTGAATTTTGTTTGGTCGTGGAATACTTTGGTTTCTCCATCTATGGTAATTGAGAGTTTGGCCGGGTATAGTAGCCTGGGCTG F,,,,F,F,F:,F,,,F::,,F,FF,FF::,FF:,,FF,:FFFFFF:F,:,FF:F,,FFFF:F:,,FFFFFFF,F,FF::FFF,FFFF:F NH:i:1 HI:i:1 CR:Z:TGGTGATTCTTGAGCA UR:Z:TTTACATTTCCG GX:Z:- GN:Z:- CB:Z:TGGTGATTCTTGAGCA UB:Z:TTTACATTTCCG A00319:434:H27L5DRX2:2:2236:3215:27978 16 1 3015473 255 90M 0 0 GTTACTTCACTCAGGATGATACCCTCCAGGTCCATCCATTTGCCTAGGAATCTCATAAATTCATTTTTTAATAGCTGAGTAGTATTCCAT FFFFFFFF:FFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 CR:Z:TCTAACTTCTGTCGCT UR:Z:ATATAAGAATCC GX:Z:- GN:Z:- CB:Z:TCTAACTTCTGTCGCT UB:Z:ATATAAGAATCC A00319:434:H27L5DRX2:1:2258:30572:15452 16 1 3016298 0 90M 0 0 ACTTTCTCCTCTGTAAGTTTCAGTGTCTCTGGTTTTATGTGGAGTTCCTTAATCCACTTAGATTTGACCTTAGTACAAGGAGATAGGAAT F::FFFFFFFFFFFF:FFF:FFFFFFFFF:FFFFF:,FFFFFFF,FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:9 HI:i:1 CR:Z:TGGGCTGCAGCTACAT UR:Z:CCTCTGAATCCT GX:Z:- GN:Z:- CB:Z:TGGGCTGCAGCTACAT UB:Z:CCTCTGAATCCT A00319:434:H27L5DRX2:2:2144:5665:32565 16 1 3018672 1 90M 0 0 TTTTGTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTT ::FFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 CR:Z:ACTTTGTAGAAGGTAG UR:Z:GAGCATGGCTAT GX:Z:- GN:Z:- CB:Z:ACTTTGTAGAAGGTAG UB:Z:GAGCATGGCTAT A00319:434:H27L5DRX2:2:2112:6506:30138 16 1 3018677 1 90M 0 0 TTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTGTT FFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFF,F:FFFF:FFFFFFFFFFFFFFFF:FF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 CR:Z:CATACTTCAACCGATT UR:Z:CTTAAGTTTCTA GX:Z:- GN:Z:- CB:Z:CATACTTCAACCGATT UB:Z:CTTAAGTTTCTA A00319:434:H27L5DRX2:2:2102:32380:16297 16 1 3019419 0 90M 0 0 TACTTTGGTTTCTCCATCTATGGTTATTGAGAGGTTGGCTGGGTATAGTAGCCTGGGCTGGAATTTGTGTTCTCTTAGTGTCTGTATAAC ,FFF:,:::,:F,FFF:F:F,:FF,FFFFFFFF,F:::FF:F::F,:F,,FF:::::F:F:FFF,F:FFFFFFFF,:F,F:F,:F:FF,F NH:i:8 HI:i:1 CR:Z:CTCTGGTCATTAAGCC UR:Z:GAATCATTGGCA GX:Z:- GN:Z:- CB:Z:CTCTGGTCATTAAGCC UB:Z:GAATCATTGGCA A00319:434:H27L5DRX2:2:2153:18530:36401 16 1 3020095 3 90M 0 0 TTTGTTCATTTCCATCACCTGTTTGGATGTGTTTTCCTGTTTTTCTATACGGACTTCTACCTGTTTGGTTGTGTTTTCCTGTTTTTCTTT FFFFFFFFFFFF::F:FFFFF,F,,FFFFFFFF,FFFFFFFFFFF,FFF,FFFFFFF:FFF:F:F,FFFFFFF:FFFFFFFFF:FFFFFF NH:i:2 HI:i:1 CR:Z:GCAGCCACACTACACA UR:Z:CGTTGGGATCAT GX:Z:- GN:Z:- CB:Z:GCAGCCACACTACACA UB:Z:CGTTGGGATCAT A00319:434:H27L5DRX2:2:2102:32081:3129 0 1 3025098 255 88M2S 0 0 ACCCTCCAGTGGAAAAAAGACAGCATTGTCAACAAAGGGTGTGGGCACAACTGGTGGTTATCATCATGAAGAATGCAAATTGATCCATTC :,F,,FF,F,,:FF,:F:FFF::::::,FF,FF,FF,F,,F:,::F,F,FF,:F,F,,,,,F,F,,,FFFF,::,F:,F,F,FFFF,,FF NH:i:1 HI:i:1 CR:Z:TGACCACAGGCATGGT UR:Z:CCGATTTAATGG GX:Z:- GN:Z:- CB:Z:TGTCCACAGGCATGGT UB:Z:CCGATTTAATGG A00319:434:H27L5DRX2:2:2119:26133:23171 0 1 3038147 255 90M 0 0 AGATAACTGTGCACCTCCCTGAAAGAGGAGAGCTTGCCTGCAGAGACTGCTCTGACCCCTGAAACTCAGGGAAGAGAGCTAGTCTCCCTG FF:FFFFFFFFFFFFFFFFFFFFFFFFFFF:FFF:FFF,FFF:FFFFFFFFFFFFFFFF:FFFFFF:FFFFF,:,FF:F,FF:FF:,F:F NH:i:1 HI:i:1 CR:Z:TGCTGAACACCAGCGT UR:Z:GCATTGTTCTCC GX:Z:- GN:Z:- CB:Z:TGCTGAACACCAGCGT UB:Z:GCATTGTTCTCC A00319:434:H27L5DRX2:2:2236:1488:5791 0 1 3038346 255 24S66M 0 0 GTTCATTTCAGCTTTTCACACCTCTGTACTAACAGGAACCAAGACCACTCACCATCACCAGAACCCAGCACACCCACTTCGCCCAGTCCA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 CR:Z:CTTACCGAGAGCAACC UR:Z:AAATTTTCTATA GX:Z:- GN:Z:- CB:Z:CTTACCGAGAGCAACC UB:Z:AAATTTTCTATA
Ok, I see the issue: your BAM file doesn't match the names of the BED file. For example, BED file has "chr1" whereas the BAM has "1".
First, delete all files from the SoloTE output directory you were using, and then use
sed 's/^chr//' BEDfile > NEWBEDfile
to create a BED file that matches the chromosome annotation of the alignment.
Afterwards, run SoloTE with this new BED file.
That makes sense but the same error happened despite deleting directories and referring to new BED. Could it be the second 'chr' in the bed file? 1 3000001 3002128 chr1|3000001|3002128|L1_Mus3:L1:LINE|- 12955 -
Shall I try: sed 's/^chr//; s/chr//' Mm10TEannotation.bed > Mm10TEannotationV3.bed
edit: that didn't fix it either.
Looking at the BAM lines you shared, an additional issue is the reads not associated with genes have the GN:Z:-
tag, instead of not having that tag (as what we used during the development of SoloTE).
This is similar to issue #3, and we are working on a fix. Can you share your alignment protocol? This would help us better expand the usability and compatibility of our tool. In #3, there was a mention of using the Cumulus's STAR solo pipeline.
.
Yes, I'm also using STARsolo (but not Cumulus). I use a custom reference based on mm10 but which includes a handful transgenes that we add. Happy to share the BAM or other items if it helps.
Thanks for the quick reply.
And yes, I would appreciate it a lot If you could share the BAM file of chromosome 1 only, which hopefully should be enough for validation (we are now working on the fix to this issue).
Just tried to run and got a new error:
python /mnt/Sabrent2TBRefsCR/SoloTE/SoloTE_v1/SoloTE_pipeline.py /mnt/12TBNew0821/AnatDecabitine/star_out/PBS2/Aligned.sortedByCoord.out.bam 48 Tester /mnt/Sabrent2TBRefsCR/SoloTE/Mm10TEannotationV2.bed SoloTE started at 10:15:11 samtools found! bedtools found! ['1', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '2', '3', '4', '5', '6', '7', '8', '9', 'MT', 'X', 'Y', 'JH584299.1', 'GL456233.1', 'JH584301.1', 'GL456211.1', 'GL456350.1', 'JH584293.1', 'GL456221.1', 'JH584297.1', 'JH584296.1', 'GL456354.1', 'JH584294.1', 'JH584298.1', 'JH584300.1', 'GL456219.1', 'GL456210.1', 'JH584303.1', 'JH584302.1', 'GL456212.1', 'JH584304.1', 'GL456379.1', 'GL456216.1', 'GL456393.1', 'GL456366.1', 'GL456367.1', 'GL456239.1', 'GL456213.1', 'GL456383.1', 'GL456385.1', 'GL456360.1', 'GL456378.1', 'GL456389.1', 'GL456372.1', 'GL456370.1', 'GL456381.1', 'GL456387.1', 'GL456390.1', 'GL456394.1', 'GL456392.1', 'GL456382.1', 'GL456359.1', 'GL456396.1', 'GL456368.1', 'JH584292.1', 'JH584295.1', 'postWPREV5gestalt', 'BFP2', 'mTom', 'mGFP', 'YAPRELApA', 'DNETV5Celltag'] ['@CO\tuser command line: /mnt/Sabrent2TBRefsCR/STARlatest0822/STAR ', 'quantMode GeneCounts ', 'soloType CB_UMI_Simple ', 'soloCBwhitelist /mnt/Sabrent2TBRefsCR/STAR/3M-february-2018.txt ', 'soloCBlen 16 ', 'soloUMIstart 17 ', 'soloUMIlen 12 ', 'soloBarcodeReadLength 1 ', 'soloMultiMappers EM ', 'soloFeatures Gene Velocyto ', 'soloUMIfiltering MultiGeneUMI ', 'soloCBmatchWLtype 1MM_multi_pseudocounts ', 'outSAMtype BAM SortedByCoordinate ', 'outSAMattributes NH HI CR UR CB UB GX GN ', 'outSAMmultNmax 1 ', 'runThreadN 32 ', 'genomeDir /mnt/Sabrent2TBRefsCR/WorkingMouseRefKBtools01012020/WPRE_V5Gestalt_static_DoxMinTm_0621 ', 'sjdbGTFfile /mnt/Sabrent2TBRefsCR/WorkingMouseRefKBtools01012020/WPRE_V5Gestalt_static_DoxMinTm_0621/tmp.gtf ', 'readFilesCommand zcat ', 'readFilesPrefix /mnt/12TBNew0821/AnatDecabitine/SS-15340', '01', '14', '2022/FASTQ/ ', 'readFilesIn PBS-CTRL-GEX_S2_L001_R2_001.fastq.gz,PBS-CTRL-GEX_S2_L002_R2_001.fastq.gz PBS-CTRL-GEX_S2_L001_R1_001.fastq.gz,PBS-CTRL-GEX_S2_L002_R1_001.fastq.gz ', 'outTmpDir=/mnt/Sabrent4TBMsTum/AY_7319_Dox_Suc_Etv5_07_19/STARtmp ', 'outFileNamePrefix star_out/PBS2/'] 1 outSAMattributes NH HI CR UR CB UB GX GN CB and UB tags present in BAM file Traceback (most recent call last): File "/mnt/Sabrent2TBRefsCR/SoloTE/SoloTE_v1/SoloTE_pipeline.py", line 81, in
outprefix = sys.argv[5] IndexError: list index out of range
Edit, looking at the code, it sems like there is a new command line argument for outprefix so I added one and it is running now.
@JBreunig you are correct. We detected a bug when running it, so we modified the command line arguments.
Please let us know if you are able to successfully run the pipeline, in order to make an official release of this updated version.
Everything looks good through to processing in Seurat...thanks!
You're more than welcome!
I tried running the commands and it failed after about an hour. Might you have any suggestions on troubleshooting from the output below? (Samtools 1.16.1; Bedtools v2.30.0; R 4.2.1)