bvaldebenitom / SoloTE

GNU General Public License v3.0
27 stars 6 forks source link

SoloTE Pipeline Not Working #28

Closed Sandman-1 closed 3 months ago

Sandman-1 commented 1 year ago

Good morning! I hope this finds you well.

I am currently trying to use SoloTE version 1.08 to analyze transposable element expression in a dataset of 6 samples where I merged their bam files using mergeBams (https://github.com/furlan-lab/mergebams). Unfortunately, I am unable to get the pipeline to work because it's throwing errors at several steps and resulting in empty intermediate files. The code for my command is below. I am using a bed file of TEs from the mm10 genome generated via the first step in which we convert the rmsk file to bed format.

Code: (base) hebbale@DESKTOP-OMJPGV6:/mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM$ python SoloTE_pipeline.py --threads 8 --bam sorted_out.bam --teannotation mm10_TEs.bed --outputprefix GEX_TE_out --outputdir soloTE_Results

Output: `SoloTE started at 11:51:43 samtools found! bedtools found! BAM file generated using CellRanger SoloTE v1.08 started! SoloTE Home directory /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM SoloTE executed from /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM Results will be stored in /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/soloTE_Results Input BAM file: /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/sorted_out.bam Input TE BED file: /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/mm10_TEs.bed Currently working in temporary directory: /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/soloTE_Results/GEX_TE_out_SoloTE_temp samtools view --threads 8 -d GN -U GEX_TE_out_nogenes_oldtag.bam -O BAM -o temp.bam /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/sorted_out.bam samtools view -d GN:- -o GEX_TE_out_nogenes_newtag.bam -U GEX_TE_out_genes.bam -O BAM temp.bam samtools cat --threads 8 -o GEX_TE_out_nogenes.bam GEX_TE_out_nogenes_oldtag.bam GEX_TE_out_nogenes_newtag.bam samtools view --threads 8 -O BAM -o GEX_TE_out_nogenes_overlappingtes.bam -L /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/mm10_TEs.bed GEX_TE_out_nogenes.bam [bed_read] Parse error reading "/mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/mm10_TEs.bed" at line 1 samtools view: Could not read file "/mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/mm10_TEs.bed" samtools index GEX_TE_out_nogenes_overlappingtes.bam [E::hts_open_format] Failed to open file "GEX_TE_out_nogenes_overlappingtes.bam" : No such file or directory samtools index: failed to open "GEX_TE_out_nogenes_overlappingtes.bam": No such file or directory bedtools bamtobed -i GEX_TE_out_nogenes_overlappingtes.bam -split > GEX_TE_out_nogenes_overlappingtes.bed [E::hts_open_format_impl] Failed to open file GEX_TE_out_nogenes_overlappingtes.bam Failed to open BAM file GEX_TE_out_nogenes_overlappingtes.bam bedtools intersect -a /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/mm10_TEs.bed -b GEX_TE_out_nogenes_overlappingtes.bed -u > GEX_TE_out_selectedtes.bed Error: unable to open file or unable to determine types for file /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/mm10_TEs.bed

At this point, I pause the pipeline because I don't think it will progress farther. Any help would be greatly appreciated!

bvaldebenitom commented 1 year ago

Hi @Sandman-1 !

Could you share the output of head mm10_TEs.bed ?

Sandman-1 commented 1 year ago

Sure! Here it is. Command is head -n 10 mm10_TEs.bed

14 chr1 3008500 14|chr1|3008500|+::RLTR26B_MM|+ 607 + 7 chr1 3011641 7|chr1|3011641|-::RLTR25A|+ 607 + 64 chr1 3011970 64|chr1|3011970|-::RLTR25A|+ 607 + 3 chr1 3026805 3|chr1|3026805|+::ERVB7_1-LTR_MM|+ 608 + 21 chr1 3028234 21|chr1|3028234|-::RLTR14|+ 608 + 45 chr1 3030069 45|chr1|3030069|-::RLTR13E|+ 608 + 43 chr1 3031358 43|chr1|3031358|-::IAPLTR1a_Mm|+ 608 + 36 chr1 3035552 36|chr1|3035552|-::ERVB7_3-LTR_MM|+ 608 + 2 chr1 3044869 2|chr1|3044869|+::RLTR1D|+ 608 + 58 chr1 3046212 58|chr1|3046212|-::RLTR13C1|+ 608 +

Sandman-1 commented 1 year ago

Ah, that formatted poorly. My bad.

14 chr1 3008500 14|chr1|3008500|+::RLTR26B_MM|+ 607 + 7 chr1 3011641 7|chr1|3011641|-::RLTR25A|+ 607 + 64 chr1 3011970 64|chr1|3011970|-::RLTR25A|+ 607 + 3 chr1 3026805 3|chr1|3026805|+::ERVB7_1-LTR_MM|+ 608 + 21 chr1 3028234 21|chr1|3028234|-::RLTR14|+ 608 + 45 chr1 3030069 45|chr1|3030069|-::RLTR13E|+ 608 + 43 chr1 3031358 43|chr1|3031358|-::IAPLTR1a_Mm|+ 608 + 36 chr1 3035552 36|chr1|3035552|-::ERVB7_3-LTR_MM|+ 608 + 2 chr1 3044869 2|chr1|3044869|+::RLTR1D|+ 608 + 58 chr1 3046212 58|chr1|3046212|-::RLTR13C1|+ 608 +

bvaldebenitom commented 1 year ago

It seems the BED file is not being generated properly.

Did you get the RepeatMasker file from http://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/rmsk.txt.gz ?

That file is formatted slightly different than the standard RepeatMasker output file. For mm10, you can get it from here: https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.fa.out.gz

We now packed an automated script with SoloTE, SoloTE_RepeatMasker_to_BED.py, with you can run as python SoloTE_RepeatMasker_to_BED.py -g mm10 and it will generate the proper mm10 TE BED file.

You can download the new SoloTE release 1.09 here, which contain some speed improvements, along with the aforementioned helper script.

Sandman-1 commented 1 year ago

Hello! Hope you are doing well, and thank you for the updated pipeline!

Unfortunately, I ran into the following error when the program was calculating counts across each chromosome.

Traceback (most recent call last): File "/home/hebbale/.local/lib/python3.10/site-packages/pandas/core/indexes/range.py", line 414, in get_loc return self._range.index(new_key) ValueError: 4 is not in range

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/SoloTE_pipeline.py", line 217, in tecounts2.loc[tecounts2[4].isnull(),4] = tecounts2.loc[tecounts2[4].isnull(),1] File "/home/hebbale/.local/lib/python3.10/site-packages/pandas/core/frame.py", line 3896, in getitem indexer = self.columns.get_loc(key) File "/home/hebbale/.local/lib/python3.10/site-packages/pandas/core/indexes/range.py", line 416, in get_loc raise KeyError(key) from err KeyError: 4

Perhaps it is also worth noting these messages prior to calculating counts, even though they didn't interrupt the program.

[bam_translate] RG tag "2618_M:0:1:HCNGVDSX3:2" on read "A00330:89:HCNGVDSX3:2:1419:2058:9565" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] RG tag "370_C:0:1:HCNGVDSX3:2" on read "A00330:89:HCNGVDSX3:2:1304:10574:33959" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] RG tag "2645_C:0:1:HCNGVDSX3:2" on read "A00330:89:HCNGVDSX3:2:2552:4110:33536" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] RG tag "2607_M:0:1:HCNGVDSX3:2" on read "A00330:89:HCNGVDSX3:2:1420:32642:6668" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID. [bam_translate] RG tag "2606_M:0:1:HCNGVDSX3:2" on read "A00330:89:HCNGVDSX3:2:1168:22806:4163" encountered with no corresponding entry in header, tag lost. Unknown tags are only reported once per input file for each tag ID.

Any thoughts on this matter would be greatly appreciated, as I did also merge several bam files together using that software linked above. Thank you so much!

bvaldebenitom commented 1 year ago

Hi again @Sandman-1 !

I'm not sure if these messages regarding the BAM file are actually interfering with the pipeline. The error still seems to be related to the BED file. However, now it looks similar to issue #30 .

Could you share the last SoloTE command you ran, and the first lines of the BED file you used for that?

Sandman-1 commented 1 year ago

Sure thing! The command is as follows: python SoloTE_pipeline.py --threads 8 --bam sorted_out.bam --teannotation mm10_rmsk.bed --outputprefix GEX_TE_out --outputdir soloTE_Results

The first ten lines of the BED file are as follows: chr1 3000001 3002128 chr1|3000001|3002128|L1_Mus3:L1:LINE|10.5|- 10.5 - chr1 3003153 3003994 chr1|3003153|3003994|L1Md_F:L1:LINE|26.8|- 26.8 - chr1 3003994 3004054 chr1|3003994|3004054|L1_Mus3:L1:LINE|27.9|- 27.9 - chr1 3004041 3004206 chr1|3004041|3004206|L1_Rod:L1:LINE|19.9|+ 19.9 + chr1 3004271 3005001 chr1|3004271|3005001|L1_Rod:L1:LINE|19.9|+ 19.9 + chr1 3005002 3005439 chr1|3005002|3005439|L1_Rod:L1:LINE|22.1|+ 22.1 + chr1 3005461 3005548 chr1|3005461|3005548|Lx9:L1:LINE|22.6|+ 22.6 + chr1 3005571 3006764 chr1|3005571|3006764|Lx9:L1:LINE|22.6|+ 22.6 + chr1 3007015 3007268 chr1|3007015|3007268|L1M4:L1:LINE|28.9|- 28.9 - chr1 3008117 3008483 chr1|3008117|3008483|L1_Mur2:L1:LINE|14.8|- 14.8 -

bvaldebenitom commented 1 year ago

Thanks! BED file looks ok.

Let's check the BAM file with samtools view sorted_out.bam|head. Basically, we want to verify that the column 3 of the BAM file is in the same format as column 1 of the BED file.

Additionally, since you are using the same command as before, it is likely that SoloTE recognized the files from the previous unsuccessful run, and didn't generate them again. For this, please run the following command to delete the directory: rm -Rf /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/soloTE_Results

Once this is done, re-run SoloTE and report back.

Sandman-1 commented 1 year ago

Oh wow, thank you for responding so fast! Here is the output of the samtools view command.

A00330:89:HCNGVDSX3:2:1419:2058:9565 0 chr1 3000045 255 84M7S 0 0 GTCTTCTTTGAAAGCCTGATAGAACTCTGCACTAAACCCATCTGGTCCTGGGCTTTTTTTTTTTTTTTTTTTTTTTTTTGGGTGAAAATTA FFF:FFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFFFFFFFFF:FF:FFFFFFFFFFFFFFF:FFFFFFFF:,,,F:,::,,,,,,,,, NH:i:1 HI:i:1 AS:i:74 nM:i:4 RG:Z:2618_M:0:1:HCNGVDSX3:2 RE:A:I xf:i:0 CR:Z:CACATGCCAAATGCCC CY:Z:FFFFFFFFFFFFFFFF UR:Z:GCAATCATATCG UY:Z:FFFFFFFFFFFF UB:Z:GCAATCATATCG CB:Z:2618-M_CACATGCCAAATGCCC-1 A00330:89:HCNGVDSX3:2:1304:10574:33959 16 chr1 3000093 1 39S38M14S 0 0 TATTTGTGTTTTTTTTTTTTTTTTTTTTTTTTTTTTTGGTGGGGTTTTTTTTTTTTTTTTTTTTTTTTTTGGGTGGGTTTTTGCTTTGCCG F,F,,:,,F:,,:FFFFFFFFFFF,:FFFFF,FFFF:,,,FF,,,,,::FF,F,:F,FF:FF:FFFF,,F,,:,,,F::,F,,,,,,,,FF NH:i:4 HI:i:2 AS:i:35 nM:i:1 pa:i:37 RG:Z:370_C:0:1:HCNGVDSX3:2 RE:A:I xf:i:0 CR:Z:TAAGTGCTCCGCCTAT CY:Z:FFFFFFFFFFFFFFFF UR:Z:ATCTATAATAAC UY:Z::FFFFFFFFFFF UB:Z:ATCTATAATAAC CB:Z:370-C_TAAGTGCTCCGCCTAT-1 A00330:89:HCNGVDSX3:2:1548:8097:5087 16 chr1 3000255 255 91M 0 0 CTGGTTTTTTTTTAGTATAGCCTTTCATAGTAGAATCTGATGATGTTTTTGATATCCTCATGTTCTGTTGTTATGTCTCCTTTTTCATTTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF NH:i:1 HI:i:1 AS:i:85 nM:i:2 RG:Z:2643_C:0:1:HCNGVDSX3:2 RE:A:I xf:i:0 CR:Z:GCTTAAATCTTGGACG CY:Z:FFFFFFFFFFFFFFFF UR:Z:GGAAAGATCCCC UY:Z:FFFFFFFFFFFF UB:Z:GGAAAGATCCCC CB:Z:2643-C_GCTTAAATCTTGGACG-1 A00330:89:HCNGVDSX3:2:2153:6696:32252 16 chr1 3000255 255 91M 0 0 CTGGTTTTTTTTTAGTATAGCCTTTCATAGTAGAATCTGATGATGTTTTTGATATCCTCATGTTCTGTTGTTATGTCTCCTTTTTCATTTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:85 nM:i:2 RG:Z:2643_C:0:1:HCNGVDSX3:2 RE:A:I xf:i:0 CR:Z:GCTTAAATCTTGGACG CY:Z:FFFFFFFFFFFFFFFF UR:Z:GGAAAGATCCCC UY:Z:FFFFFFFFFFFF UB:Z:GGAAAGATCCCC CB:Z:2643-C_GCTTAAATCTTGGACG-1 A00330:89:HCNGVDSX3:2:2552:4110:33536 16 chr1 3000353 255 91M 0 0 GTTAATTATAGTACAGTCCCTATGCCCTCTAGTTAGTCTGGCTAAGGGTTTATCTATCTTGTTGACTTTCTCAAAGAACCAGCTACTAGTT FFFFF:FFFFFFFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:87 nM:i:1 RG:Z:2645_C:0:1:HCNGVDSX3:2 RE:A:I xf:i:0 CR:Z:CACTTAAAGCCTCTCG CY:Z:FFFFFFFFFFFFFFFF UR:Z:CATTTGCCCCAT UY:Z:FFFFFFFFFFFF UB:Z:CATTTGCCCCAT CB:Z:2645-C_CACTTAAAGCCTCTCG-1 A00330:89:HCNGVDSX3:2:1420:32642:6668 16 chr1 3000955 255 61M30S 0 0 TTATGAGAAATTGTTTTGTAGATATCTATTAAATTCATTTGTTTCATAACTTCGGTTAGTGCCCATTTACTCTGCGTTGTTACAACTGCTT FF::FF,,FF,FFFF,F,FF,FFFF,F:::,,FF,:F,,FF:,F,FFFF,FF:FFFFF:F,:,,F,::,,FFFF,FF,,,,F:,:FFF,,: NH:i:1 HI:i:1 AS:i:54 nM:i:3 ts:i:30 RG:Z:2607_M:0:1:HCNGVDSX3:2 RE:A:I xf:i:0 CR:Z:ACTGAAACATAATTGC CY:Z:FF:,,FFFFFFF,F,, UR:Z:AAACACCATACC UY:Z:F,FFFF:FFFF: UB:Z:AAACACCATACC CB:Z:2607-M_ACTGAAACATAATTGC-1 A00330:89:HCNGVDSX3:2:1613:20618:36010 16 chr1 3001676 255 91M 0 0 ATTTTGTTTGGTCATGGAATACTTTGGTTTCTCCATCTATGGTAATTGAGAGTTTGGCTGGGTATAGCAACCTGGGCTGGCACTTCTGCTT FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:89 nM:i:0 RG:Z:2645_C:0:1:HCNGVDSX3:2 RE:A:I xf:i:0 CR:Z:AAGCGAATCGAGGAAC CY:Z:FFFFFFFFFFFFFFFF UR:Z:TTTTCAGAGATA UY:Z:FFFFFFFFFFFF UB:Z:TTTTCAGAGATA CB:Z:2645-C_AAGCGAATCGAGGAAC-1 A00330:89:HCNGVDSX3:2:2631:2663:7639 16 chr1 3001711 255 91M 0 0 TCTATGGTAATTGAGAGTTTGGCTGGGTATAGCAACCTGGGCTGGCACTTCTGCTTTCTTAGGGTCTGTATAACATCTGTCCAGGATCTTC FFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF:FFFFFFFF NH:i:1 HI:i:1 AS:i:89 nM:i:0 RG:Z:2645_C:0:1:HCNGVDSX3:2 RE:A:I xf:i:0 CR:Z:AAGCGAATCGAGGAAC CY:Z:FFFFFFFFFFFFFF:F UR:Z:TTTTCAGAGATA UY:Z:FFFFFFFFFFFF UB:Z:TTTTCAGAGATA CB:Z:2645-C_AAGCGAATCGAGGAAC-1 A00330:89:HCNGVDSX3:2:2478:28592:4131 16 chr1 3001721 255 91M 0 0 TTGAGAGTTTGGCTGGGTATAGCAACCTGGGCTGGCACTTCTGCTTTCTTAGGGTCTGTATAACATCTGTCCAGGATCTTCTGGCTTTCAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:89 nM:i:0 RG:Z:2645_C:0:1:HCNGVDSX3:2 RE:A:I xf:i:0 CR:Z:AAGCGAATCGAGGAAC CY:Z:FFFFFFFFFFFFFF:F UR:Z:TTTTCAGAGATA UY:Z:FFFFFFFFFFFF UB:Z:TTTTCAGAGATA CB:Z:2645-C_AAGCGAATCGAGGAAC-1 A00330:89:HCNGVDSX3:2:2513:6451:8594 16 chr1 3001721 255 91M 0 0 TTGAGAGTTTGGCTGGGTATAGCAACCTGGGCTGGCACTTCTGCTTTCTTAGGGTCTGTATAACATCTGTCCAGGATCTTCTGGCTTTCAT FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:1 HI:i:1 AS:i:89 nM:i:0 RG:Z:2645_C:0:1:HCNGVDSX3:2 RE:A:I xf:i:0 CR:Z:AAGCGAATCGAGGAAC CY:Z:FFFFFFFFFFFFFFFF UR:Z:TTTTCAGAGATA UY:Z:FFFFFFFFFFFF UB:Z:TTTTCAGAGATA CB:Z:2645-C_AAGCGAATCGAGGAAC-1

Additionally, I actually deleted the previous folder with SoloTE results before rerunning the command using version 1.09.

bvaldebenitom commented 1 year ago

You are welcome!

BAM file also looks ok. What is the output of ls -lht GEX_TE_out_SoloTE_temp ?

If possible, can you share the file GEX_TE_out_allcounts.txt from the SoloTE_temp directory?

Sandman-1 commented 1 year ago

Yay, glad to know my bam is not the problem.

Output of ls command: total 73G -rwxrwxrwx 1 hebbale hebbale 3.6G Sep 19 06:48 GEX_TE_out_allcounts.txt -rwxrwxrwx 1 hebbale hebbale 204M Sep 19 06:47 GEX_TE_out_countpercell_chr9.counts -rwxrwxrwx 1 hebbale hebbale 70M Sep 19 06:41 GEX_TE_out_countpercell_chrM.counts -rwxrwxrwx 1 hebbale hebbale 160M Sep 19 06:25 GEX_TE_out_countpercell_chr8.counts -rwxrwxrwx 1 hebbale hebbale 81M Sep 19 06:13 GEX_TE_out_countpercell_chrX.counts -rwxrwxrwx 1 hebbale hebbale 187M Sep 19 06:05 GEX_TE_out_countpercell_chr6.counts -rwxrwxrwx 1 hebbale hebbale 182M Sep 19 05:55 GEX_TE_out_countpercell_chr17.counts -rwxrwxrwx 1 hebbale hebbale 228M Sep 19 05:15 GEX_TE_out_countpercell_chr5.counts -rwxrwxrwx 1 hebbale hebbale 4.7K Sep 19 05:07 GEX_TE_out_countpercell_JH584295.1.counts -rwxrwxrwx 1 hebbale hebbale 36 Sep 19 05:07 GEX_TE_out_countpercell_JH584292.1.counts -rwxrwxrwx 1 hebbale hebbale 366K Sep 19 05:07 GEX_TE_out_countpercell_GL456216.1.counts -rwxrwxrwx 1 hebbale hebbale 752K Sep 19 05:07 GEX_TE_out_countpercell_JH584304.1.counts -rwxrwxrwx 1 hebbale hebbale 22K Sep 19 05:05 GEX_TE_out_countpercell_GL456212.1.counts -rwxrwxrwx 1 hebbale hebbale 24K Sep 19 05:05 GEX_TE_out_countpercell_GL456210.1.counts -rwxrwxrwx 1 hebbale hebbale 468 Sep 19 05:05 GEX_TE_out_countpercell_JH584298.1.counts -rwxrwxrwx 1 hebbale hebbale 14K Sep 19 05:05 GEX_TE_out_countpercell_JH584294.1.counts -rwxrwxrwx 1 hebbale hebbale 108 Sep 19 05:05 GEX_TE_out_countpercell_GL456354.1.counts -rwxrwxrwx 1 hebbale hebbale 156 Sep 19 05:05 GEX_TE_out_countpercell_JH584296.1.counts -rwxrwxrwx 1 hebbale hebbale 78 Sep 19 05:05 GEX_TE_out_countpercell_JH584297.1.counts -rwxrwxrwx 1 hebbale hebbale 55K Sep 19 05:05 GEX_TE_out_countpercell_GL456221.1.counts -rwxrwxrwx 1 hebbale hebbale 3.6K Sep 19 05:05 GEX_TE_out_countpercell_JH584293.1.counts -rwxrwxrwx 1 hebbale hebbale 25K Sep 19 05:05 GEX_TE_out_countpercell_GL456350.1.counts -rwxrwxrwx 1 hebbale hebbale 32K Sep 19 05:05 GEX_TE_out_countpercell_GL456211.1.counts -rwxrwxrwx 1 hebbale hebbale 509K Sep 19 05:05 GEX_TE_out_countpercell_GL456233.1.counts -rwxrwxrwx 1 hebbale hebbale 3.3K Sep 19 05:05 GEX_TE_out_countpercell_JH584299.1.counts -rwxrwxrwx 1 hebbale hebbale 3.3M Sep 19 05:05 GEX_TE_out_countpercell_chrY.counts -rwxrwxrwx 1 hebbale hebbale 218M Sep 19 04:59 GEX_TE_out_countpercell_chr7.counts -rwxrwxrwx 1 hebbale hebbale 268M Sep 19 04:48 GEX_TE_out_countpercell_chr2.counts -rwxrwxrwx 1 hebbale hebbale 246M Sep 19 04:26 GEX_TE_out_countpercell_chr4.counts -rwxrwxrwx 1 hebbale hebbale 133M Sep 19 02:34 GEX_TE_out_countpercell_chr19.counts -rwxrwxrwx 1 hebbale hebbale 180M Sep 19 02:10 GEX_TE_out_countpercell_chr3.counts -rwxrwxrwx 1 hebbale hebbale 110M Sep 18 23:31 GEX_TE_out_countpercell_chr18.counts -rwxrwxrwx 1 hebbale hebbale 227M Sep 18 23:18 GEX_TE_out_countpercell_chr1.counts -rwxrwxrwx 1 hebbale hebbale 242M Sep 18 22:20 GEX_TE_out_countpercell_chr11.counts -rwxrwxrwx 1 hebbale hebbale 158M Sep 18 21:17 GEX_TE_out_countpercell_chr12.counts -rwxrwxrwx 1 hebbale hebbale 173M Sep 18 21:01 GEX_TE_out_countpercell_chr10.counts -rwxrwxrwx 1 hebbale hebbale 138M Sep 18 21:00 GEX_TE_out_countpercell_chr14.counts -rwxrwxrwx 1 hebbale hebbale 144M Sep 14 10:12 GEX_TE_out_countpercell_chr13.counts -rwxrwxrwx 1 hebbale hebbale 132M Sep 14 10:03 GEX_TE_out_countpercell_chr16.counts -rwxrwxrwx 1 hebbale hebbale 131M Sep 14 09:59 GEX_TE_out_countpercell_chr15.counts -rwxrwxrwx 1 hebbale hebbale 15M Sep 14 05:04 GEX_TE_out_final.bam.bai -rwxrwxrwx 1 hebbale hebbale 51G Sep 13 23:56 GEX_TE_out_final.bam -rwxrwxrwx 1 hebbale hebbale 1.9K Sep 13 18:08 GEX_TE_out_teannotated.bam -rwxrwxrwx 1 hebbale hebbale 1.9K Sep 13 18:08 temp_annotated_te.bam -rwxrwxrwx 1 hebbale hebbale 0 Sep 13 17:58 GEX_TE_out_selectedtes.bed -rwxrwxrwx 1 hebbale hebbale 8.0G Sep 13 17:58 GEX_TE_out_nogenes_overlappingtes.bed -rwxrwxrwx 1 hebbale hebbale 6.8M Sep 13 13:46 GEX_TE_out_nogenes_overlappingtes.bam.bai -rwxrwxrwx 1 hebbale hebbale 6.6G Sep 13 13:31 GEX_TE_out_nogenes_overlappingtes.bam

My allcounts.txt file is a bit big (3.6 Gb). Let me see issue 30 to see how the other person uploaded it.

Sandman-1 commented 1 year ago

Okie dokie, I ran the tail command to see the bottom 100 lines of the allcounts file! Here it is:

tail -n 100 soloTE_Results/GE X_TE_out_SoloTE_temp/GEX_TE_out_allcounts.txt Zwilch 2645-C_GAGATAAGTAACCACA-1 1 Zwilch 2645-C_GAGCCTTCAAACTGCC-1 1 Zwilch 2645-C_GAGTAATAGACAAACG-1 1 Zwilch 2645-C_GATGCATTCCTAAATG-1 1 Zwilch 2645-C_GCAAACTTCCTCATGC-1 1 Zwilch 2645-C_GCCATTACATGTCGCG-1 1 Zwilch 2645-C_GCGATATTCCTGTTCA-1 1 Zwilch 2645-C_GCGATTTAGACACCGC-1 1 Zwilch 2645-C_GCGATTTAGATTCCTT-1 1 Zwilch 2645-C_GCGCCTTGTTTAAAGC-1 1 Zwilch 2645-C_GCTCATTGTGGGAACA-1 1 Zwilch 2645-C_GCTGACCAGGAGTCGG-1 1 Zwilch 2645-C_GCTTGACCAAACCCTA-1 1 Zwilch 2645-C_GGAACCACATTGCAGC-1 1 Zwilch 2645-C_GGAGCAAGTAACGGGA-1 1 Zwilch 2645-C_GGCAAGCCACCTAATG-1 1 Zwilch 2645-C_GGCTAGTGTTAAATGC-1 1 Zwilch 2645-C_GGGCATTGTGCATTAG-1 1 Zwilch 2645-C_GGGCGAATCTCACATT-1 1 Zwilch 2645-C_GGGTATTTCACTAAGC-1 1 Zwilch 2645-C_GGGTTACGTTCTTTAG-1 1 Zwilch 2645-C_GGTACTAGTAATCACG-1 1 Zwilch 2645-C_GGTGTGACAGTTTACG-1 1 Zwilch 2645-C_GTAAAGCCAGGCGAGT-1 1 Zwilch 2645-C_GTAAGCTTCACGCCAA-1 1 Zwilch 2645-C_GTATTGCAGTAAGGGC-1 1 Zwilch 2645-C_GTCAAACTCGCGACAC-1 1 Zwilch 2645-C_GTGCGCAGTTTGACCT-1 1 Zwilch 2645-C_GTGCTCAAGGTCATTA-1 1 Zwilch 2645-C_TACAGCTAGGAAGCAC-1 1 Zwilch 2645-C_TACCAAATCCGGTATG-1 1 Zwilch 2645-C_TAGCCTCTCAGCACGC-1 1 Zwilch 2645-C_TAGCGGCTCATAATCG-1 1 Zwilch 2645-C_TAGCGGCTCCTAAATG-1 1 Zwilch 2645-C_TAGGCTGTCGCGACAC-1 1 Zwilch 2645-C_TATGACATCCTAAGGT-1 1 Zwilch 2645-C_TCAATCGCACCAGGTT-1 2 Zwilch 2645-C_TCATTTGGTTGTCATC-1 1 Zwilch 2645-C_TCCAGGTCAAGCTTTG-1 1 Zwilch 2645-C_TCCTGTTCATTATGGT-1 1 Zwilch 2645-C_TCGGTTTGTTAGGACC-1 1 Zwilch 2645-C_TCTAACTTCCTTGTTG-1 1 Zwilch 2645-C_TGCTGGATCCGCCTCA-1 9 Zwilch 2645-C_TGCTTAAAGCACGATT-1 1 Zwilch 2645-C_TGGGCATGTATTGGTG-1 1 Zwilch 2645-C_TTAGGATGTATTGGAT-1 1 Zwilch 2645-C_TTGCAGCCACCGGTAT-1 1 Zwilch 2645-C_TTTCTTGCAACCCTCC-1 1 Zwilch 2645-C_TTTGGCTGTCATTACC-1 1 Zwilch 370-C_AACTGTTCACCGTTCC-1 1 Zwilch 370-C_AAGACCAAGTCATTTC-1 1 Zwilch 370-C_AAGCCTCCAATCCCTT-1 1 Zwilch 370-C_AAGCGGGTCTACCTCA-1 1 Zwilch 370-C_ACCCGCTGTTATGTGG-1 1 Zwilch 370-C_ACGAAGTCATTGCGAC-1 1 Zwilch 370-C_ACGACAAAGCAGCTAT-1 2 Zwilch 370-C_AGCAAATAGCTGAGGG-1 1 Zwilch 370-C_AGCTGCTCACCTATAG-1 1 Zwilch 370-C_AGCTTAATCTTGGACG-1 1 Zwilch 370-C_AGGAAACGTTCGCTCA-1 1 Zwilch 370-C_AGGATTGAGCTATGAC-1 1 Zwilch 370-C_AGGTTACTCGGCCAGT-1 1 Zwilch 370-C_AGTCGCATCCTAATTC-1 1 Zwilch 370-C_AGTTACATCATGAAGG-1 1 Zwilch 370-C_ATCCTTAGTTAAGGTT-1 1 Zwilch 370-C_ATGTCCACAACTAGGG-1 2 Zwilch 370-C_CAAACATGTGTGCAAC-1 1 Zwilch 370-C_CAGATTCAGGAAACTG-1 2 Zwilch 370-C_CCATAATCAGGTTAAA-1 1 Zwilch 370-C_CGAATATGTCCTTTAA-1 1 Zwilch 370-C_CGACCTGCAGCTTAGC-1 1 Zwilch 370-C_CGTGCACAGTTGTCAA-1 1 Zwilch 370-C_CTAATCCGTACGTTTC-1 1 Zwilch 370-C_CTATGGCCACAGAACG-1 1 Zwilch 370-C_CTCACACTCTTTGTAC-1 1 Zwilch 370-C_CTTCTCAAGTATTGGC-1 1 Zwilch 370-C_GACCTAGTCACTAAGC-1 2 Zwilch 370-C_GCAGGCAAGAACCTAC-1 1 Zwilch 370-C_GCCATGATCGCTAAAC-1 1 Zwilch 370-C_GCCTCGACAACAGCCT-1 1 Zwilch 370-C_GCTATCCTCTGCAAAC-1 1 Zwilch 370-C_GGGCTAACACCCTCAC-1 1 Zwilch 370-C_GGTACTTAGAATCGCT-1 1 Zwilch 370-C_TACGGTTAGGCTGTGC-1 1 Zwilch 370-C_TCAGGTTAGGACACTT-1 1 Zwilch 370-C_TCAGGTTAGTAGCGGG-1 1 Zwilch 370-C_TCATACTTCATGCTTT-1 1 Zwilch 370-C_TCATACTTCTTAGCCC-1 1 Zwilch 370-C_TCATTACTCATGTGGT-1 1 Zwilch 370-C_TCCTGGTTCATTTGCT-1 1 Zwilch 370-C_TCCTGGTTCCTCAGTC-1 1 Zwilch 370-C_TCGACAAGTTGCAGTA-1 1 Zwilch 370-C_TGCTTTAGTTAAGGCC-1 1 Zwilch 370-C_TTAGGATGTGTGTGGT-1 1 Zwilch 370-C_TTGGAGGCACCTACGG-1 1 Zwilch 370-C_TTGTTGTTCTCACATT-1 1 Zwilch 370-C_TTTACGAAGGGCCATC-1 2 Zwilch 370-C_TTTGACCGTATTTGCC-1 1 Zwilch 370-C_TTTGGCTGTAACCAGC-1 1 Zwilch 370-C_TTTGTCTAGTTGTCAA-1 1

bvaldebenitom commented 1 year ago

I can now see an issue with the GEX_TE_out_selectedtes.bed. Apparently, it is empty. There is no need to share the allcounts.txt file for now. Since no TEs are finally detected, this is causing the error seen before.

Can you run the following command? bedtools intersect -a /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/mm10_rmsk.bed -b /mnt/c/Users/hebbale/Downloads/TCF7L2/TKO_Multiome/GEX_BAM/soloTE_Results/GEX_TE_out_SoloTE_temp/GEX_TE_out_nogenes_overlappingtes.bed -u| wc -l

(please adjust the paths accordingly, but based on your previous posts that seems to be the path to each file)

Sandman-1 commented 1 year ago

bedtools intersect -a mm10_rmsk.bed -b soloTE_Results/GEX_TE_out_SoloTE_temp/GEX_TE_out_nogenes_overlappingtes.bed -u| wc -l 0

bvaldebenitom commented 1 year ago

This is odd, as it would indicate that there are no TEs in your dataset.

By any chance, is the data / BAM files you are using public? If so, you could share their accession code so I can run some more tests. Additionally, if you could share the data privately (i.e. OneDrive, or similar), and only a subset of it (for example, by running samtools view BAM_file chr1 will subset it to alignments only in chromosome 1) would help further investigate.

fw262 commented 1 year ago

Hello,

I believe the issue here may be related to running SoloTE on 10X Genomics Multiome GEX bam file. I am trying to also run SoloTE on the GEX output of Multiome and it is not working. I noticed that the Multiome GEX bam has 1 additional column compared to the standard 3' GEX bam file.

Multiome GEX: A01844:83:H5TJVDSX7:2:1517:21468:12273 16 chr1 3000815 255 15S136M * 0 0 TTTTTTTTTTTTTTTTCAGCCTTAGTCCGTAGTTATCTGAAATGATGCATGGGAAAATTTCAATATTTTTGTATCTGTTGAGGACTTTTTGTGAGTGACTATATGGTCAATTTTGGAGGATTTGGTACTGAGAAGAAGGTATATATCCTTT FFFF::FF:FF,FFFF,FF::FFFFF,:FFF:FFFFFFFFFF,:FFFF:F,:FF:::FFF,FFFFFFFFFFFFF:FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFF,FFFFFFFFFFFFFFFFFFFF:FFFFF NH:i:1 HI:i:1 AS:i:132 nM:i:1 t1:i:38 RG:Z:Sample1_4279:0:1:H5TJVDSX7:2 RE:A:I xf:i:0 CR:Z:GGTAACCGTTCGCTCA CY:Z:FFFF:,FFFFFFF:FF CB:Z:GGTAACCGTTCGCTCA-1 UR:Z:CAGTTTTGGCTT UY:Z:F:FFFF:FFF,: UB:Z:CAGTTTTGGCTT

3' GEX: A01102:492:H5TH5DSX5:2:2129:25183:19053 0 1 3054830 255 61M99413N25M15S * 0 0 AACACACACACACACACACACACACACACACACACACACACACACACACACACACACACAAACACAAAAACAACACACAAAAACACCCAAAAAAAAAAACA F,FFFFFFFF:FFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFF:F:FFF,F,F,FFFFFFF,F:FFFFFF,F,,:F:,F:,F,::,:F,,,F:F NH:i:1 HI:i:1 AS:i:72 nM:i:5 RG:Z:sample-TBI_1:0:1:H5TH5DSX5:2 RE:A:I xf:i:0 CR:Z:CCATCACCAAATACGA CY:Z::FFF::F,,FF,F::F CB:Z:CCATCACCAAATACGA-1 UR:Z:TATACCTTACCC UY:Z::F:FFFF,FFFF UB:Z:TATACCTTACCC

Do you know if this is related? Have you tested SoloTE on Multiome GEX bam files?

Thanks, Michael

bvaldebenitom commented 1 year ago

Hi @fw262 ,

I appreciate the input. In principle, the additional tag should not be causing an issue, as we apply samtools to keep only the cell and UMI-related tags (CB and UB).

I will run some tests on the publicly available multiome 10X data and report back.

Thanks for the patience.

wangmasonic20 commented 10 months ago

I tried the updated version still no luck.

bvaldebenitom commented 10 months ago

Hi @wangmasonic20,

what version of samtools are you using?

wangmasonic20 commented 10 months ago

Hi @wangmasonic20, what version of samtools are you using?

samtools 1.16.1 is the version

wangmasonic20 commented 10 months ago

Hi,

Thanks for getting back to me. it was I think an issue with installation. I am using a university system so sometimes the packages they have installed have issues.

bvaldebenitom commented 10 months ago

You're welcome!

Does it work properly now?

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 10 days with no activity.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.