bvaldebenitom / SoloTE

GNU General Public License v3.0
27 stars 6 forks source link

SoloTE indexer error #31

Closed frentzeperis closed 3 months ago

frentzeperis commented 1 year ago

Good afternoon, I hope you are well!

I am using SoloTE 1.09 to analyze TE expression from a murine BAM file, it was aligned to mm10. I am trying to get the code running for the first subject before moving on to the others. It runs for a while but at the end got a few errors and the temp files were all still there. I would greatly appreciate any help!

Code: python SoloTE/SoloTE_pipeline.py --threads 1 --bam possorted_genome_bam.bam --teannotation mm10_rmsk.bed --outputprefix sub1-test --outputdir /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/

Output: SoloTE started at 12:28:50 [OK] samtools found! [OK] bedtools found! SoloTE v1.09 started! SoloTE Home directory /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/SoloTE SoloTE executed from /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te Results will be stored in /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te Input BAM file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam Input TE BED file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed Currently working in temporary directory: /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/sub1-test_SoloTE_temp samtools view -@ 1 -O BAM -o sub1-test_nogenes_overlappingtes.bam -L /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed -e '(exists([CB]) && exists([UB]) && [CB]!="-" && [UB]!="-") && (!exists([GN]) || [GN]=="-")' /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam samtools index sub1-test_nogenes_overlappingtes.bam sub1-test_nogenes_overlappingtes.bed exists in output folder. Skipping this step sub1-test_selectedtes.bed exists in output folder. Skipping this step python /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/SoloTE/annotateBAM.py sub1-test_nogenes_overlappingtes.bam sub1-test_selectedtes.bed temp_annotated_te.bam 1 samtools sort -@ 1 -O BAM -o sub1-test_teannotated.bam temp_annotated_te.bam samtools merge --threads 1 -o - /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam sub1-test_teannotated.bam|samtools view -@ 1 -O BAM -o sub1-test_final.bam -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-" && [GN]!="-"' --keep-tag GN,CB,UB samtools index sub1-test_final.bam Counts for chromosome chr1 are being generated in process: 19911 Counts for chromosome chr10 are being generated in process: 19911 Counts for chromosome chr11 are being generated in process: 19911 Counts for chromosome chr12 are being generated in process: 19911 Counts for chromosome chr13 are being generated in process: 19911 Counts for chromosome chr14 are being generated in process: 19911 Counts for chromosome chr15 are being generated in process: 19911 Counts for chromosome chr16 are being generated in process: 19911 Counts for chromosome chr17 are being generated in process: 19911 Counts for chromosome chr18 are being generated in process: 19911 Counts for chromosome chr19 are being generated in process: 19911 Counts for chromosome chr2 are being generated in process: 19911 Counts for chromosome chr3 are being generated in process: 19911 Counts for chromosome chr4 are being generated in process: 19911 Counts for chromosome chr5 are being generated in process: 19911 Counts for chromosome chr6 are being generated in process: 19911 Counts for chromosome chr7 are being generated in process: 19911 Counts for chromosome chr8 are being generated in process: 19911 Counts for chromosome chr9 are being generated in process: 19911 Counts for chromosome chrM are being generated in process: 19911 Counts for chromosome chrX are being generated in process: 19911 Counts for chromosome chrY are being generated in process: 19911 Counts for chromosome GL456233.1 are being generated in process: 19911 Counts for chromosome GL456211.1 are being generated in process: 19911 Counts for chromosome GL456350.1 are being generated in process: 19911 Counts for chromosome JH584293.1 are being generated in process: 19911 Counts for chromosome GL456221.1 are being generated in process: 19911 Counts for chromosome JH584297.1 are being generated in process: 19911 Counts for chromosome JH584296.1 are being generated in process: 19911 Counts for chromosome JH584294.1 are being generated in process: 19911 Counts for chromosome JH584298.1 are being generated in process: 19911 Counts for chromosome GL456210.1 are being generated in process: 19911 Counts for chromosome GL456212.1 are being generated in process: 19911 Counts for chromosome JH584304.1 are being generated in process: 19911 Counts for chromosome GL456216.1 are being generated in process: 19911 Counts for chromosome JH584295.1 are being generated in process: 19911 Traceback (most recent call last): File "/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3800, in get_loc return self._engine.get_loc(casted_key) File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item KeyError: 4

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/SoloTE/SoloTE_pipeline.py", line 217, in tecounts2.loc[tecounts2[4].isnull(),4] = tecounts2.loc[tecounts2[4].isnull(),1] File "/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/python3.9/site-packages/pandas/core/frame.py", line 3805, in getitem indexer = self.columns.get_loc(key) File "/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3802, in get_loc raise KeyError(key) from err KeyError: 4

bvaldebenitom commented 1 year ago

Hi @frentzeperis !

Do you have a sub1-test_allcounts.txt file in /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/sub1-test_SoloTE_temp ? If so, please share the output of head sub1-test_allcounts.txt.

Additionally, can you share the output of the following commands?

ls -lht /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/sub1-test_SoloTE_temp
head /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed
samtools view /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam|head

The output of those commands will help further diagnose these issues. Thanks!

frentzeperis commented 1 year ago

Thanks so much!

The output of head sub1-test_allcounts.txt is below: 1600012P17Rik;Pappa2 AGACCCGCACGCCACA-1 1 1700007P06Rik AGTCATGCAACGTATC-1 1 1700007P06Rik CCAAGCGGTCGAAACG-1 1 1700016C15Rik AAACCCACATCGCTAA-1 2 1700016C15Rik AAACGAACACTCCACT-1 1 1700016C15Rik AAACGCTTCTAACGGT-1 1 1700016C15Rik AAAGGTAGTGATAGTA-1 1 1700016C15Rik AAATGGATCCGATGCG-1 2 1700016C15Rik AAATGGATCCTACCAC-1 1 1700016C15Rik AACAGGGCAAAGGCGT-1 2

ls -lht /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/sub1-test_SoloTE_temp total 16342632 -rw-r--r--@ 1 frederikarentzeperis staff 591M Sep 15 13:57 sub1-test_allcounts.txt -rw-r--r-- 1 frederikarentzeperis staff 14K Sep 15 13:57 sub1-test_countpercell_JH584295.1.counts -rw-r--r-- 1 frederikarentzeperis staff 57K Sep 15 13:57 sub1-test_countpercell_GL456216.1.counts -rw-r--r-- 1 frederikarentzeperis staff 115K Sep 15 13:57 sub1-test_countpercell_JH584304.1.counts -rw-r--r-- 1 frederikarentzeperis staff 7.3K Sep 15 13:57 sub1-test_countpercell_GL456212.1.counts -rw-r--r-- 1 frederikarentzeperis staff 2.4K Sep 15 13:57 sub1-test_countpercell_GL456210.1.counts -rw-r--r-- 1 frederikarentzeperis staff 32B Sep 15 13:57 sub1-test_countpercell_JH584298.1.counts -rw-r--r-- 1 frederikarentzeperis staff 83B Sep 15 13:57 sub1-test_countpercell_JH584294.1.counts -rw-r--r-- 1 frederikarentzeperis staff 64B Sep 15 13:57 sub1-test_countpercell_JH584296.1.counts -rw-r--r-- 1 frederikarentzeperis staff 32B Sep 15 13:57 sub1-test_countpercell_JH584297.1.counts -rw-r--r-- 1 frederikarentzeperis staff 3.0K Sep 15 13:57 sub1-test_countpercell_GL456221.1.counts -rw-r--r-- 1 frederikarentzeperis staff 289B Sep 15 13:57 sub1-test_countpercell_JH584293.1.counts -rw-r--r-- 1 frederikarentzeperis staff 568B Sep 15 13:57 sub1-test_countpercell_GL456350.1.counts -rw-r--r-- 1 frederikarentzeperis staff 3.6K Sep 15 13:57 sub1-test_countpercell_GL456211.1.counts -rw-r--r-- 1 frederikarentzeperis staff 57K Sep 15 13:57 sub1-test_countpercell_GL456233.1.counts -rw-r--r-- 1 frederikarentzeperis staff 183K Sep 15 13:57 sub1-test_countpercell_chrY.counts -rw-r--r-- 1 frederikarentzeperis staff 16M Sep 15 13:57 sub1-test_countpercell_chrX.counts -rw-r--r-- 1 frederikarentzeperis staff 6.7M Sep 15 13:56 sub1-test_countpercell_chrM.counts -rw-r--r-- 1 frederikarentzeperis staff 33M Sep 15 13:54 sub1-test_countpercell_chr9.counts -rw-r--r-- 1 frederikarentzeperis staff 33M Sep 15 13:51 sub1-test_countpercell_chr8.counts -rw-r--r-- 1 frederikarentzeperis staff 45M Sep 15 13:49 sub1-test_countpercell_chr7.counts -rw-r--r-- 1 frederikarentzeperis staff 30M Sep 15 13:44 sub1-test_countpercell_chr6.counts -rw-r--r-- 1 frederikarentzeperis staff 37M Sep 15 13:42 sub1-test_countpercell_chr5.counts -rw-r--r-- 1 frederikarentzeperis staff 33M Sep 15 13:39 sub1-test_countpercell_chr4.counts -rw-r--r-- 1 frederikarentzeperis staff 38M Sep 15 13:36 sub1-test_countpercell_chr3.counts -rw-r--r-- 1 frederikarentzeperis staff 39M Sep 15 13:33 sub1-test_countpercell_chr2.counts -rw-r--r-- 1 frederikarentzeperis staff 27M Sep 15 13:30 sub1-test_countpercell_chr19.counts -rw-r--r-- 1 frederikarentzeperis staff 16M Sep 15 13:27 sub1-test_countpercell_chr18.counts -rw-r--r-- 1 frederikarentzeperis staff 33M Sep 15 13:26 sub1-test_countpercell_chr17.counts -rw-r--r-- 1 frederikarentzeperis staff 15M Sep 15 13:23 sub1-test_countpercell_chr16.counts -rw-r--r-- 1 frederikarentzeperis staff 24M Sep 15 13:22 sub1-test_countpercell_chr15.counts -rw-r--r-- 1 frederikarentzeperis staff 20M Sep 15 13:20 sub1-test_countpercell_chr14.counts -rw-r--r-- 1 frederikarentzeperis staff 20M Sep 15 13:18 sub1-test_countpercell_chr13.counts -rw-r--r-- 1 frederikarentzeperis staff 19M Sep 15 13:17 sub1-test_countpercell_chr12.counts -rw-r--r-- 1 frederikarentzeperis staff 46M Sep 15 13:15 sub1-test_countpercell_chr11.counts -rw-r--r-- 1 frederikarentzeperis staff 29M Sep 15 13:11 sub1-test_countpercell_chr10.counts -rw-r--r-- 1 frederikarentzeperis staff 30M Sep 15 13:09 sub1-test_countpercell_chr1.counts -rw-r--r-- 1 frederikarentzeperis staff 1.9M Sep 15 13:07 sub1-test_final.bam.bai -rw-r--r-- 1 frederikarentzeperis staff 5.7G Sep 15 13:06 sub1-test_final.bam -rw-r--r-- 1 frederikarentzeperis staff 1.7K Sep 15 12:42 sub1-test_teannotated.bam -rw-r--r-- 1 frederikarentzeperis staff 1.7K Sep 15 12:42 temp_annotated_te.bam -rw-r--r-- 1 frederikarentzeperis staff 2.9M Sep 15 12:42 sub1-test_nogenes_overlappingtes.bam.bai -rw-r--r-- 1 frederikarentzeperis staff 972M Sep 15 12:42 sub1-test_nogenes_overlappingtes.bam -rw-r--r-- 1 frederikarentzeperis staff 0B Sep 15 10:26 sub1-test_selectedtes.bed -rw-r--r-- 1 frederikarentzeperis staff 0B Sep 15 10:26 sub1-test_nogenes_overlappingtes.bed

head /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/mm10_rmsk.bed chr1 3000001 3002128 chr1|3000001|3002128|L1_Mus3:L1:LINE|10.5|- 10.5 - chr1 3003153 3003994 chr1|3003153|3003994|L1Md_F:L1:LINE|26.8|- 26.8 - chr1 3003994 3004054 chr1|3003994|3004054|L1_Mus3:L1:LINE|27.9|- 27.9 - chr1 3004041 3004206 chr1|3004041|3004206|L1_Rod:L1:LINE|19.9|+ 19.9 + chr1 3004271 3005001 chr1|3004271|3005001|L1_Rod:L1:LINE|19.9|+ 19.9 + chr1 3005002 3005439 chr1|3005002|3005439|L1_Rod:L1:LINE|22.1|+ 22.1 + chr1 3005461 3005548 chr1|3005461|3005548|Lx9:L1:LINE|22.6|+ 22.6 + chr1 3005571 3006764 chr1|3005571|3006764|Lx9:L1:LINE|22.6|+ 22.6 + chr1 3007015 3007268 chr1|3007015|3007268|L1M4:L1:LINE|28.9|- 28.9 - chr1 3008117 3008483 chr1|3008117|3008483|L1_Mur2:L1:LINE|14.8|- 14.8 -

samtools view /Users/frederikarentzeperis/Documents/Merad/2023-fall/test-te/possorted_genome_bam.bam|head A01185:247:HY53WDRXY:2:1101:6723:12383 16 chr1 3016338 0 92M 0 0 GGAGTTCCTTAATCCACTTAGATTTGACCTTAGTACAAGGAGATAGGAATGGATCAATTCGCATTCTTCTACATGATAACAGCCAGTTGTGC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:5 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:ATGATCGCACCGGAAA CY:Z:FFFFFFFFFFFFFFFF CB:Z:ATGATCGCACCGGAAA-1 UR:Z:TTGCATTATCTC UY:Z:FFFFFFFFFFFF UB:Z:TTGCATTATCTC A01185:247:HY53WDRXY:2:2236:9173:35728 16 chr1 3016344 0 92M 0 0 CCTTAATCCACTTAGATTTGACCTTAGTACAAGGAGATAGGAATGGATCAATTCGCATTCTTCTACATGATAACAGCCAGTTGTGCCAGCAC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:6 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:CCTGTTGTCCTTCTAA CY:Z:FFFFFFFF:FFFFFFF CB:Z:CCTGTTGTCCTTCTAA-1 UR:Z:CTCTCATTAACT UY:Z:FFFFFFFFFFFF UB:Z:CTCTCATTAACT A01185:247:HY53WDRXY:1:1169:14163:18349 16 chr1 3018673 1 92M 0 0 TTTGTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTG FFF:FFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFF:FF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:1 RE:A:I xf:i:0 CR:Z:AAAGAACGTTGACTAC CY:Z:FFFFFFFFFFFF:FFF CB:Z:AAAGAACGTTGACTAC-1 UR:Z:TATTACTTAGCT UY:Z:FFFFFFFFF,:F UB:Z:TATTACTTAGCT A01185:247:HY53WDRXY:1:1256:20636:20541 16 chr1 3018673 1 92M 0 0 TTTGTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:1 RE:A:I xf:i:0 CR:Z:TTAGGGTGTTCGAACT CY:Z:FFFFFFFFFFFFFFFF CB:Z:TTAGGGTGTTCGAACT-1 UR:Z:CTCTTTTTGGGT UY:Z:FFFFFFFFFFFF UB:Z:CTCTTTTTGGGT A01185:247:HY53WDRXY:2:2102:26793:1799 16 chr1 3018673 1 92M 0 0 TTTGTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTG FFFFFFFFF:FFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFF:FFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:GACTTCCTCGCAATGT CY:Z:FFFFFF:FFFFFF::F CB:Z:GACTTCCTCGCAATGT-1 UR:Z:TGGGCCTATCCT UY:Z:FFFFFFFFFFFF UB:Z:TGGGCCTATCCT A01185:247:HY53WDRXY:1:1254:26069:24283 16 chr1 3018673 1 92M 0 0 TTTGTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTG FFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:1 RE:A:I xf:i:0 CR:Z:TGATGCAAGACAGCTG CY:Z:FFFFFFFFFFFFFFFF CB:Z:TGATGCAAGACAGCTG-1 UR:Z:TAAAGTACTCAC UY:Z::FFFFFFFFFFF UB:Z:TAAAGTACTCAC A01185:247:HY53WDRXY:2:1233:19750:10238 16 chr1 3018676 1 92M 0 0 GTTTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTGTTT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:CGTAATGAGAGTCACG CY:Z:FFFFFFFFFFFFFFFF CB:Z:CGTAATGAGAGTCACG-1 UR:Z:AGCGTTTTTGCA UY:Z:FFF:FFFFFFFF UB:Z:AGCGTTTTTGCA A01185:247:HY53WDRXY:2:1266:5141:17315 16 chr1 3018678 1 92M 0 0 TTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTGTTTCC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:GGACGTCCACACGCCA CY:Z:FFFFFFFFFFFFFFFF CB:Z:GGACGTCCACACGCCA-1 UR:Z:CACTCTTCGAAC UY:Z:FFFFFF:FFFFF UB:Z:CACTCTTCGAAC A01185:247:HY53WDRXY:2:1267:3125:36558 16 chr1 3018678 1 92M 0 0 TTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTGTTTCC FFFFFFF:FFFFFFFF:FFFFF:FFFFFFFFFFFF,FFFFFFFFFFFFFFFFF:FFFFFF:FFFFFFFFF,FFFFFFFFFFFFFFFF:FFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:2 RE:A:I xf:i:0 CR:Z:ACGTACACATACTGTG CY:Z:FFFFFFFFFFFFFFFF CB:Z:ACGTACACATACTGTG-1 UR:Z:CTCTCCCCTGCT UY:Z:FFFFFFFFFFFF UB:Z:CTCTCCCCTGCT A01185:247:HY53WDRXY:1:2106:10294:13996 16 chr1 3018678 1 92M 0 0 TTTAGGATAAAATGTTCTGTAGATATCTGTCAAGTCCATTTGTTTCATCACTTCTGTTAGTTTCACTGTGTCCCTGTTTAGTTTCTGTTTCC FFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFF NH:i:3 HI:i:1 AS:i:90 nM:i:0 RG:Z:MIME26_AL1-3_0:0:1:HY53WDRXY:1 RE:A:I xf:i:0 CR:Z:GTTGCTCTCAAGAAAC CY:Z:FFFFFFFFFFFFFFFF CB:Z:GTTGCTCTCAAGAAAC-1 UR:Z:GACACATACACA UY:Z:FFFFFFFFFFFF UB:Z:GACACATACACA

bvaldebenitom commented 1 year ago

Looks like input files are in order, and most of the results are being generated. However, the files

sub1-test_selectedtes.bed
sub1-test_nogenes_overlappingtes.bed

appear to be created before the file sub1-test_nogenes_overlappingtes.bam which is not the standard behaviour, since the BAM file is created first, and then the aforementioned BED files are created.

This results in:

sub1-test_nogenes_overlappingtes.bed exists in output folder. Skipping this step
sub1-test_selectedtes.bed exists in output folder. Skipping this step

Since these files are empty, then no TEs are annotated. Can you check the output of grep -c sub1-test_allcounts.txt? If the result is 0, it will confirm this issue.

Did you experience any interruption during the pipeline execution? Could you try deleting the temp directory, and running the pipeline again?

frentzeperis commented 1 year ago

I had no interruptions. I tried running everything again and regenerated the initial BED file because I was wondering the same thing, it seemed weird. I am still getting errors. I tried to run grep but it just runs forever (I have been running it for close to an hour and it is still going). Here is the output of my second run.

Code: python SoloTE/SoloTE_pipeline.py --threads 1 --bam MIME26_AL1-3_0_v1/possorted_genome_bam.bam --teannotation rmsk.bed --outputprefix TE --outputdir MIME26_AL1-3_0_v1

Output: SoloTE started at 15:51:48 [OK] samtools found! [OK] bedtools found! SoloTE v1.09 started! SoloTE Home directory /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE SoloTE executed from /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te Results will be stored in /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1 Input BAM file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam Input TE BED file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed Currently working in temporary directory: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/TE_SoloTE_temp samtools view -@ 1 -O BAM -o TE_nogenes_overlappingtes.bam -L /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed -e '(exists([CB]) && exists([UB]) && [CB]!="-" && [UB]!="-") && (!exists([GN]) || [GN]=="-")' /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam samtools index TE_nogenes_overlappingtes.bam bedtools bamtobed -i TE_nogenes_overlappingtes.bam -split > TE_nogenes_overlappingtes.bed bedtools intersect -a /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed -b TE_nogenes_overlappingtes.bed -u > TE_selectedtes.bed python /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/annotateBAM.py TE_nogenes_overlappingtes.bam TE_selectedtes.bed temp_annotated_te.bam 1 samtools sort -@ 1 -O BAM -o TE_teannotated.bam temp_annotated_te.bam [bam_sort_core] merging from 17 files and 1 in-memory blocks... samtools merge --threads 1 -o - /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam TE_teannotated.bam|samtools view -@ 1 -O BAM -o TE_final.bam -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-" && [GN]!="-"' --keep-tag GN,CB,UB samtools index TE_final.bam Counts for chromosome chr1 are being generated in process: 22550 Counts for chromosome chr10 are being generated in process: 22550 Counts for chromosome chr11 are being generated in process: 22550 Counts for chromosome chr12 are being generated in process: 22550 Counts for chromosome chr13 are being generated in process: 22550 Counts for chromosome chr14 are being generated in process: 22550 Counts for chromosome chr15 are being generated in process: 22550 Counts for chromosome chr16 are being generated in process: 22550 Counts for chromosome chr17 are being generated in process: 22550 Counts for chromosome chr18 are being generated in process: 22550 Counts for chromosome chr19 are being generated in process: 22550 Counts for chromosome chr2 are being generated in process: 22550 Counts for chromosome chr3 are being generated in process: 22550 Counts for chromosome chr4 are being generated in process: 22550 Counts for chromosome chr5 are being generated in process: 22550 Counts for chromosome chr6 are being generated in process: 22550 Counts for chromosome chr7 are being generated in process: 22550 Counts for chromosome chr8 are being generated in process: 22550 Counts for chromosome chr9 are being generated in process: 22550 Counts for chromosome chrM are being generated in process: 22550 Counts for chromosome chrX are being generated in process: 22550 Counts for chromosome chrY are being generated in process: 22550 Counts for chromosome GL456233.1 are being generated in process: 22550 Counts for chromosome GL456211.1 are being generated in process: 22550 Counts for chromosome GL456350.1 are being generated in process: 22550 Counts for chromosome JH584293.1 are being generated in process: 22550 Counts for chromosome GL456221.1 are being generated in process: 22550 Counts for chromosome JH584297.1 are being generated in process: 22550 Counts for chromosome JH584296.1 are being generated in process: 22550 Counts for chromosome JH584294.1 are being generated in process: 22550 Counts for chromosome JH584298.1 are being generated in process: 22550 Counts for chromosome GL456210.1 are being generated in process: 22550 Counts for chromosome GL456212.1 are being generated in process: 22550 Counts for chromosome JH584304.1 are being generated in process: 22550 Counts for chromosome GL456216.1 are being generated in process: 22550 Counts for chromosome JH584295.1 are being generated in process: 22550 Creating final results directory /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/TE_SoloTE_output was created Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_legacytes.txt TE_legacytes_MATRIX dyld[23347]: Library not loaded: @rpath/libreadline.6.2.dylib Referenced from: <185433D7-8B40-31AA-8BD9-465D23C57257> /Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/libR.dylib Reason: tried: '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/bin/exec/../../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/bin/exec/../../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/libreadline.6.2.dylib' (no such file) Traceback (most recent call last): File "/Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/SoloTE_pipeline.py", line 272, in os.rename(mtx_outname,finaldir+"/"+mtx_outname) FileNotFoundError: [Errno 2] No such file or directory: 'TE_legacytes_MATRIX' -> '/Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/TE_SoloTE_output/TE_legacytes_MATRIX'

bvaldebenitom commented 1 year ago

It seems there is an error with R. Can you run the following commands?

R --version
conda env export > fk_issue31_environment.yml

The second command should create the file fk_issue31_environment.yml. Send it to me so I can further inspect. Additionally, if you are able to share the _allcounts.txt file generated now, that would also be helpful.

frentzeperis commented 1 year ago

When I type R --version I am getting the following: Library not loaded: @rpath/libreadline.6.2.dylib Referenced from: <185433D7-8B40-31AA-8BD9-465D23C57257> /Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/libR.dylib Reason: tried: '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/bin/exec/../../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/bin/exec/../../../libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/R/lib/libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/lib/libreadline.6.2.dylib' (no such file), '/Users/frederikarentzeperis/opt/anaconda3/envs/soloTE/libreadline.6.2.dylib' (no such file) zsh: abort R --version

zsh: no matches found: dyld[27033]:

Here is the fk_issue31_environment.yml fk_issue31_environment.yml.zip

The _allcounts is too big to upload even after compression (1.52GB before and 235mb after compression)

frentzeperis commented 1 year ago

I made another conda environment and reinstalled everything, not sure what broke R in the last environment. I think it ran this time, thanks for helping me.

code: python SoloTE/SoloTE_pipeline.py --threads 5 --bam MIME26_AL1-3_0_v1/possorted_genome_bam.bam --teannotation rmsk.bed --outputprefix TE --outputdir MIME26_AL1-3_0_v1

output: SoloTE started at 20:37:43 [OK] samtools found! [OK] bedtools found! SoloTE v1.09 started! SoloTE Home directory /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE SoloTE executed from /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te Results will be stored in /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1 Input BAM file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam Input TE BED file: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed Currently working in temporary directory: /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/TE_SoloTE_temp samtools view -@ 5 -O BAM -o TE_nogenes_overlappingtes.bam -L /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed -e '(exists([CB]) && exists([UB]) && [CB]!="-" && [UB]!="-") && (!exists([GN]) || [GN]=="-")' /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam samtools index TE_nogenes_overlappingtes.bam bedtools bamtobed -i TE_nogenes_overlappingtes.bam -split > TE_nogenes_overlappingtes.bed bedtools intersect -a /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/rmsk.bed -b TE_nogenes_overlappingtes.bed -u > TE_selectedtes.bed python /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/annotateBAM.py TE_nogenes_overlappingtes.bam TE_selectedtes.bed temp_annotated_te.bam 1 samtools sort -@ 5 -O BAM -o TE_teannotated.bam temp_annotated_te.bam [bam_sort_core] merging from 15 files and 5 in-memory blocks... samtools merge --threads 5 -o - /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/possorted_genome_bam.bam TE_teannotated.bam|samtools view -@ 5 -O BAM -o TE_final.bam -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-" && [GN]!="-"' --keep-tag GN,CB,UB samtools index TE_final.bam Counts for chromosome chr1 are being generated in process: 30695 Counts for chromosome chr10 are being generated in process: 30693 Counts for chromosome chr11 are being generated in process: 30694 Counts for chromosome chr12 are being generated in process: 30692 Counts for chromosome chr13 are being generated in process: 30696 Counts for chromosome chr14 are being generated in process: 30696 Counts for chromosome chr15 are being generated in process: 30692 Counts for chromosome chr16 are being generated in process: 30695 Counts for chromosome chr17 are being generated in process: 30693 Counts for chromosome chr18 are being generated in process: 30696 Counts for chromosome chr19 are being generated in process: 30695 Counts for chromosome chr2 are being generated in process: 30692 Counts for chromosome chr3 are being generated in process: 30694 Counts for chromosome chr4 are being generated in process: 30696 Counts for chromosome chr5 are being generated in process: 30693 Counts for chromosome chr6 are being generated in process: 30695 Counts for chromosome chr7 are being generated in process: 30692 Counts for chromosome chr8 are being generated in process: 30696 Counts for chromosome chr9 are being generated in process: 30694 Counts for chromosome chrM are being generated in process: 30693 Counts for chromosome chrX are being generated in process: 30695 Counts for chromosome chrY are being generated in process: 30696 Counts for chromosome GL456233.1 are being generated in process: 30696 Counts for chromosome GL456211.1 are being generated in process: 30696 Counts for chromosome GL456350.1 are being generated in process: 30696 Counts for chromosome JH584293.1 are being generated in process: 30696 Counts for chromosome GL456221.1 are being generated in process: 30696 Counts for chromosome JH584297.1 are being generated in process: 30696 Counts for chromosome JH584296.1 are being generated in process: 30696 Counts for chromosome JH584294.1 are being generated in process: 30696 Counts for chromosome JH584298.1 are being generated in process: 30696 Counts for chromosome GL456210.1 are being generated in process: 30696 Counts for chromosome GL456212.1 are being generated in process: 30696 Counts for chromosome JH584304.1 are being generated in process: 30696 Counts for chromosome GL456216.1 are being generated in process: 30696 Counts for chromosome JH584295.1 are being generated in process: 30696 Creating final results directory /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/MIME26_AL1-3_0_v1/TE_SoloTE_output was created Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_legacytes.txt TE_legacytes_MATRIX Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_locustes.txt TE_locustes_MATRIX Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_classtes.txt TE_classtes_MATRIX Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_familytes.txt TE_familytes_MATRIX Rscript /Users/frederikarentzeperis/Documents/Merad/2023-fall/himc_te/SoloTE/generate_mtx.R TE_subfamilytes.txt TE_subfamilytes_MATRIX A total of 110474508 UMIs are in the final matrix. Of these, 91659123 (82.969%) correspond to genes. and 18815385 (17.031%) correspond to TEs. TE detected UMIs are distributed as follows: Locus-specific TEs: 16194148 UMIs (86.069%). Subfamily TEs: 2621237 (13.931%). Creating TE_SoloTE.stats TE statistics file Finished creating TE_SoloTE.stats SoloTE finished with MIME26_AL1-3_0_v1/possorted_genome_bam.bam SoloTE finished at 21:59:56 SoloTE total running time: 1:22:12.853309

frentzeperis commented 1 year ago

Hm actually, before closing this, there are 5 output folders all with the barcodes, features, and matrix files. They are called: TE_classtes_MATRIX, TE_familytes_MATRIX, TE_legacytes_MATRIX, TE_locustes_MATRIX, and TE_subfamilytes_MATRIX.

Is the intended output in one of these? I thought we were just meant to get one output with the 3 file types.

bvaldebenitom commented 1 year ago

@frentzeperis :

Thanks for sharing the update. Sometimes setting up R within one conda environment breaks another installation within a different environment.

It looks like it now finished successfully. And yes, this is the new intended output as of version 1.09. This was done in order to provide a seamless generation of the matrices corresponding to different ways of analyzing TE data. The description for each one is as follows:

Overall the class, family, and subfamily matrices could be used to get an idea of global changes in TE expression. For example, the tool scTE reports results only at the subfamily level, and here we provide users with a similar output. On the other hand, locustes and legacytes matrices contain locus-specific expression which might be helpful for correlation analyses. A more detailed explanation of the difference between locus and legacy can be found here. Previously, the default (and only output) was the legacy TEs.