I apologize for bothering you. While using SoloTE to quantify transposable elements (TEs), I noticed that during the final step of splitting the BAM file, only a few chromosomes were loaded, rather than all of them. Is this because SoloTE only quantifies the first few chromosomes, or is it just displaying those few? I am using a reference genome that is a fusion of the human genome and a viral genome, and I’m concerned that this might lead to missing information from the viral genome after fusion.
Running command: python SoloTE_pipeline.py --threads 50 --bam /home/lw/newdisk/project/bamdata/A5_345Aligned.sortedByCoord.out.bam --teannotation hg38_rmsk.bed --outputprefix A5_345 --outputdir /home/lw/newdisk/project/result_soloTE --dual
SoloTE started at 15:07:42
[OK] samtools found!
[OK] bedtools found!
SoloTE v1.10 started!
SoloTE Home directory /home/lw/newdisk/project/soloTE
SoloTE executed from /home/lw/newdisk/project/soloTE
Results will be stored in /home/lw/newdisk/project/result_soloTE
Input BAM file: /home/lw/newdisk/project/bamdata/A5_345Aligned.sortedByCoord.out.bam
Input TE BED file: /home/lw/newdisk/project/soloTE/hg38_rmsk.bed
Dual mode enabled. SoloTE will calculate TE expression also considering reads annotated to genes.
Currently working in temporary directory: /home/lw/newdisk/project/result_soloTE/A5_345_SoloTE_temp
samtools view -@ 50 -O BAM -o A5_345_nogenes_overlappingtes.bam -L /home/lw/newdisk/project/soloTE/hg38_rmsk.bed -e '(exists([CB]) && exists([UB]) && [CB]!="-" && [UB]!="-")' /home/lw/newdisk/project/bamdata/A5_345Aligned.sortedByCoord.out.bam
samtools index A5_345_nogenes_overlappingtes.bam
bedtools bamtobed -i A5_345_nogenes_overlappingtes.bam -split > A5_345_nogenes_overlappingtes.bed
bedtools intersect -a /home/lw/newdisk/project/soloTE/hg38_rmsk.bed -b A5_345_nogenes_overlappingtes.bed -u > A5_345_selectedtes.bed
python /home/lw/newdisk/project/soloTE/annotateBAM.py A5_345_nogenes_overlappingtes.bam A5_345_selectedtes.bed temp_annotated_te.bam 1
samtools sort -@ 50 -O BAM -o A5_345_teannotated.bam temp_annotated_te.bam
[bam_sort_core] merging from 0 files and 50 in-memory blocks...
samtools merge --threads 50 -o - /home/lw/newdisk/project/bamdata/A5_345Aligned.sortedByCoord.out.bam A5_345_teannotated.bam|samtools view -@ 50 -O BAM -o A5_345_final.bam -e 'exists([CB]) && exists([UB]) && exists([GN]) && [CB]!="-" && [UB]!="-" && [GN]!="-"' --keep-tag GN,CB,UB
samtools index A5_345_final.bam
Counts for chromosome chr1 are being generated in process: 35704
Counts for chromosome chr2 are being generated in process: 35705
Counts for chromosome chr3 are being generated in process: 35706
Counts for chromosome chr5 are being generated in process: 35708
Counts for chromosome chr4 are being generated in process: 35707
Counts for chromosome chr6 are being generated in process: 35709
Counts for chromosome chr7 are being generated in process: 35710
Counts for chromosome chr8 are being generated in process: 35711
Counts for chromosome chr9 are being generated in process: 35712
Counts for chromosome chr10 are being generated in process: 35713
Counts for chromosome chr11 are being generated in process: 35714
Counts for chromosome chr12 are being generated in process: 35715
Counts for chromosome chr13 are being generated in process: 35716
Counts for chromosome chr14 are being generated in process: 35717
Counts for chromosome chr15 are being generated in process: 35718
Counts for chromosome chr16 are being generated in process: 35719
Counts for chromosome chr17 are being generated in process: 35720
Counts for chromosome chr18 are being generated in process: 35721
Counts for chromosome chr19 are being generated in process: 35722
Counts for chromosome chr20 are being generated in process: 35723
Counts for chromosome chr21 are being generated in process: 35724
Counts for chromosome chr22 are being generated in process: 35725
Counts for chromosome chrX are being generated in process: 35726
Counts for chromosome chrY are being generated in process: 35727
Counts for chromosome chrM are being generated in process: 35728
Counts for chromosome GL000194.1 are being generated in process: 35729
Counts for chromosome GL000195.1 are being generated in process: 35730
Counts for chromosome GL000218.1 are being generated in process: 35731
Counts for chromosome GL000219.1 are being generated in process: 35732
Counts for chromosome KI270711.1 are being generated in process: 35733
Counts for chromosome KI270721.1 are being generated in process: 35734
Counts for chromosome KI270726.1 are being generated in process: 35735
Counts for chromosome KI270727.1 are being generated in process: 35736
Counts for chromosome KI270734.1 are being generated in process: 35737
Creating final results directory
/home/lw/newdisk/project/result_soloTE/A5_345_SoloTE_output was created
Rscript /home/lw/newdisk/project/soloTE/generate_mtx.R A5_345_legacytes.txt A5_345_legacytes_MATRIX
Rscript /home/lw/newdisk/project/soloTE/generate_mtx.R A5_345_locustes.txt A5_345_locustes_MATRIX
Rscript /home/lw/newdisk/project/soloTE/generate_mtx.R A5_345_classtes.txt A5_345_classtes_MATRIX
Rscript /home/lw/newdisk/project/soloTE/generate_mtx.R A5_345_familytes.txt A5_345_familytes_MATRIX
Rscript /home/lw/newdisk/project/soloTE/generate_mtx.R A5_345_subfamilytes.txt A5_345_subfamilytes_MATRIX
A total of 32182705 UMIs are in the final matrix.
Of these,
25047426 (77.829%) correspond to genes.
and 7135279 (22.171%) correspond to TEs.
TE detected UMIs are distributed as follows:
Locus-specific TEs: 6798252 UMIs (95.277%).
Subfamily TEs: 337027 (4.723%).
Creating A5_345_SoloTE.stats TE statistics file
Finished creating A5_345_SoloTE.stats
SoloTE finished with /home/lw/newdisk/project/bamdata/A5_345Aligned.sortedByCoord.out.bam
SoloTE finished at 18:46:18
SoloTE total running time: 3:38:35.398095
Dear author,
I apologize for bothering you. While using SoloTE to quantify transposable elements (TEs), I noticed that during the final step of splitting the BAM file, only a few chromosomes were loaded, rather than all of them. Is this because SoloTE only quantifies the first few chromosomes, or is it just displaying those few? I am using a reference genome that is a fusion of the human genome and a viral genome, and I’m concerned that this might lead to missing information from the viral genome after fusion.